问题
Here's the situation:
- The application world consists of hundreds of thousands of states.
- Given a state, I can work out a set of 3 or 4 other reachable states. A simple recursion can build a tree of states that gets very large very fast.
- I need to perform a DFS to a specific depth in this tree from the root state, to search for the subtree which contains the 'minimal' state (calculating the value of the node is irrelevant to the question).
Using a single thread to perform the DFS works, but is very slow. Covering 15 levels down can take a few good minutes, and I need to improve this atrocious performance. Trying to assign a thread to each subtree created too many threads and caused an OutOfMemoryError
. Using a ThreadPoolExecutor
wasn't much better.
My question: What's the most efficient way to traverse this large tree?
回答1:
I don't believe navigating the tree is your problem as your tree has about 36 million nodes. Instead is it more likely what you are doing with each node is expensive.
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicLong;
public class Main {
public static final int TOP_LEVELS = 2;
enum BuySell {}
private static final AtomicLong called = new AtomicLong();
public static void main(String... args) throws InterruptedException {
int maxLevels = 15;
long start = System.nanoTime();
method(maxLevels);
long time = System.nanoTime() - start;
System.out.printf("Took %.3f second to navigate %,d levels called %,d times%n", time / 1e9, maxLevels, called.longValue());
}
public static void method(int maxLevels) throws InterruptedException {
ExecutorService service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
try {
int result = method(service, 0, maxLevels - 1, new int[maxLevels]).call();
} catch (Exception e) {
e.printStackTrace();
}
service.shutdown();
service.awaitTermination(10, TimeUnit.MINUTES);
}
// single threaded process the highest levels of the tree.
private static Callable<Integer> method(final ExecutorService service, final int level, final int maxLevel, final int[] options) {
int choices = level % 2 == 0 ? 3 : 4;
final List<Callable<Integer>> callables = new ArrayList<Callable<Integer>>(choices);
for (int i = 0; i < choices; i++) {
options[level] = i;
Callable<Integer> callable = level < TOP_LEVELS ?
method(service, level + 1, maxLevel, options) :
method1(service, level + 1, maxLevel, options);
callables.add(callable);
}
return new Callable<Integer>() {
@Override
public Integer call() throws Exception {
Integer min = Integer.MAX_VALUE;
for (Callable<Integer> result : callables) {
Integer num = result.call();
if (min > num)
min = num;
}
return min;
}
};
}
// at this level, process the branches in separate threads.
private static Callable<Integer> method1(final ExecutorService service, final int level, final int maxLevel, final int[] options) {
int choices = level % 2 == 0 ? 3 : 4;
final List<Future<Integer>> futures = new ArrayList<Future<Integer>>(choices);
for (int i = 0; i < choices; i++) {
options[level] = i;
final int[] optionsCopy = options.clone();
Future<Integer> future = service.submit(new Callable<Integer>() {
@Override
public Integer call() {
return method2(level + 1, maxLevel, optionsCopy);
}
});
futures.add(future);
}
return new Callable<Integer>() {
@Override
public Integer call() throws Exception {
Integer min = Integer.MAX_VALUE;
for (Future<Integer> result : futures) {
Integer num = result.get();
if (min > num)
min = num;
}
return min;
}
};
}
// at these levels each task processes in its own thread.
private static int method2(int level, int maxLevel, int[] options) {
if (level == maxLevel) {
return process(options);
}
int choices = level % 2 == 0 ? 3 : 4;
int min = Integer.MAX_VALUE;
for (int i = 0; i < choices; i++) {
options[level] = i;
int n = method2(level + 1, maxLevel, options);
if (min > n)
min = n;
}
return min;
}
private static int process(final int[] options) {
int min = options[0];
for (int i : options)
if (min > i)
min = i;
called.incrementAndGet();
return min;
}
}
prints
Took 1.273 second to navigate 15 levels called 35,831,808 times
I suggest you limit the number of threads and only use separate threads for the highest levels of the tree. How many cores do you have? Once you have enough threads to keep every core busy, you don't need to create more threads as this just adds overhead.
Java has a built in Stack which thread safe, however I would just use ArrayList which is more efficient.
回答2:
You will definitely have to use an iterative method. Simplest way is a stack based DFS with a pseudo code similar to this:
STACK.push(root)
while (STACK.nonempty)
current = STACK.pop
if (current.done) continue
// ... do something with node ...
current.done = true
FOREACH (neighbor n of current)
if (! n.done )
STACK.push(n)
The time complexity of this is O(n+m) where n (m) denotes the number of nodes (edges) in your graph. Since you have a tree this is O(n) and should work quickly for n>1.000.000 easily...
来源:https://stackoverflow.com/questions/6414601/whats-the-best-way-to-perform-dfs-on-a-very-large-tree