问题
Suppose you need to count the number of islands on a matrix
{1, 1, 0, 0, 0},
{0, 1, 0, 0, 1},
{1, 0, 0, 1, 1},
{0, 0, 0, 0, 0},
{1, 0, 1, 0, 1}
We could simply use DFS or BFS when the input matrix size can be fitting into the memory.
However, what do we do if the input matrix is really large which could not be fitting into the memory?
I could chunk/split the input matrix into different small files and read them respectively.
But how to merge them?
I got stuck at how to merge them. I have the idea that when merging them we have to read some overlapped portion. But what is a concrete way to do so?
Trying to understand Matt's solution.
When I drew the below sample on the whiteboard and process it row by row. Merge left then merge top and it seems won't work.
From Matt's solution.
not sure what are topidx, botidx meaning
int topidx = col * 2;
int botidx = topidx + 1;
回答1:
Using union-find, the basic algorithm (without worrying about memory) is:
- Create a set for every
1
- Merge the sets for every pair of adjacent
1
s. It doesn't matter what order you find them in, so reading order is usually fine. - Count the number of root sets -- there will be one for every island.
Easy, and with a little care, you can do this using sequential access to the matrix and only 2 rows worth of memory:
- Initialize the island count to 0
- Read the first row, create a set for each
1
, and merge sets in adjacent columns. For each additional row:
- Read the row, create a set for each
1
, and merge sets in adjacent columns; - Merge sets in the new row with adjacent sets in the previous row. ALWAYS POINT THE LINKS DOWNWARD, so that you never end up with a set in the new row linked to a parent in the old row.
- Count the remaining root sets in the previous row, and add the number to your island count. These will never be able to merge with anything else.
- Discard all the sets in the previous row -- you're never going to need them again, because you already counted them and nothing links to them.
- Read the row, create a set for each
Finally, count the root sets in the last row and add them to your island count.
The key to this, of course, is always pointing the links downward whenever you link sets in different rows. This will not hurt the complexity of the algorithm, and if you're using your own union-find, then it is easy to accomplish. If you're using a library data structure then you can use it just for each row, and keep track of the links between root sets in different rows yourself.
Since this is actually one of my favorite algorithms, here is an implementation in Java. This is not the most readable implementation since it involves some low-level tricks, but is super-efficient and short -- the kind of thing I'd write where performance is very important:
import java.util.Arrays;
public class Islands
{
private static final String[] matrix=new String[] {
" ############# ### ",
" # ##### ## ",
" # ## ## # # ",
" ### ## # # ",
" # ######### ## ## ",
" ## ## ",
" ########## ",
};
// find with path compression.
// If sets[s] < 0 then it is a link to ~sets[s]. Otherwise it is size of set
static int find(int[] sets, int s)
{
int parent = ~sets[s];
if (parent>=0)
{
int root = find(sets, parent);
if (root != parent)
{
sets[s] = ~root;
}
return root;
}
return s;
}
// union-by-size
// If sets[s] < 0 then it is a link to ~sets[s]. Otherwise it is size of set
static boolean union(int[] sets, int x, int y)
{
x = find(sets,x);
y = find(sets,y);
if (x!=y)
{
if ((sets[x] < sets[y]))
{
sets[y] += sets[x];
sets[x] = ~y;
}
else
{
sets[x] += sets[y];
sets[y] = ~x;
}
return true;
}
return false;
}
// Count islands in matrix
public static void main(String[] args)
{
// two rows of union-find sets.
// top row is at even indexes, bottom row is at odd indexes. This arrangemnt is chosen just
// to make resizing this array easier.
// For each value x:
// x==0 => no set. x>0 => root set of size x. x<0 => link to ~x
int cols=4;
int[] setrows= new int[cols*2];
int islandCount = 0;
for (String s : matrix)
{
System.out.println(s);
//Make sure our rows are big enough
if (s.length() > cols)
{
cols=s.length();
if (setrows.length < cols*2)
{
int newlen = Math.max(cols,setrows.length)*2;
setrows = Arrays.copyOf(setrows, newlen);
}
}
//Create sets for land in bottom row, merging left
for (int col=0; col<s.length(); ++col)
{
if (!Character.isWhitespace(s.charAt(col)))
{
int idx = col*2+1;
setrows[idx]=1; //set of size 1
if (idx>=2 && setrows[idx-2]!=0)
{
union(setrows, idx, idx-2);
}
}
}
//merge up
for (int col=0; col<cols; ++col)
{
int topidx = col*2;
int botidx = topidx+1;
if (setrows[topidx]!=0 && setrows[botidx]!=0)
{
int toproot=find(setrows,topidx);
if ((toproot&1)!=0)
{
//top set is already linked down
union(setrows, toproot, botidx);
}
else
{
//link top root down. It does not matter that we aren't counting its size, since
//we will shortly throw it aaway
setrows[toproot] = ~botidx;
}
}
}
//count root sets, discard top row, and move bottom row up while fixing links
for (int col=0; col<cols; ++col)
{
int topidx = col * 2;
int botidx = topidx + 1;
if (setrows[topidx]>0)
{
++islandCount;
}
int v = setrows[botidx];
setrows[topidx] = (v>=0 ? v : v|1); //fix up link if necessary
setrows[botidx] = 0;
}
}
//count remaining root sets in top row
for (int col=0; col<cols; ++col)
{
if (setrows[col*2]>0)
{
++islandCount;
}
}
System.out.println("\nThere are "+islandCount+" islands there");
}
}
来源:https://stackoverflow.com/questions/55284043/algorithm-use-union-find-to-count-number-of-islands