Depth First Search in Parallel

问题

I have a huge binary Tree (each Node has a Pass and Fail Node) and I want to traverse this Tree in order to get all possible Paths using DFS. Since the tree is huge, the time taken to DFS using a single thread is taking very long time. So in order to solve this problem, I am now considering doing parallel DFS. The basic idea is below.

Start with a single thread and do a normal DFS, as this hits a node, spawn a new thread that start with the fail node as the start node and pass to that call the Node travelled so far
The initial thread continues in the path of the pass
At the end every thread will return a list of node that it has travelled; as such I would have traversed the whole Tree with multiple Thread. Since the so called Parent Thread passes information of the node it has travelled to the child Thread, each thread is so called Self-Sufficient

In order to implement this, I am thinking of doing this

Use newCachedThreadPool.
In the Main, I will create the Pool, and initiate a initial call to the Callable DFS Class. The constructor of the DFS class will also take the ExecutorService so that the newly spawned Thread can also spawn new Thread using the rule as discussed above

Code Implementation of DFS

    public class DFS implements Callable<List<List<TestNode>>> {
        private Node node = null;
        private List<TestNode> testNodeList = new ArrayList<TestNode>();
        private List<List<TestNode>> returnNodeList = new ArrayList<List<TestNode>>();
        private ExecutorService service = null;

        public DFS(ExecutorService service, Node node, List<TestNode> initList) {
           this.node = node;
           this.service = service;
           if (initList != null && initList.size() > 0) {
              testNodeList.addAll(initList);
        }
    }

    public List<List<TestNode>> call() throws Exception {
         performDFS(this.node);
         returnNodeList.add(testNodeList);
         return returnNodeList;
    }

    private void performDFS(Node node) {
         TestNode testNode = new TestNode();
         testNode.setName(node.getName());
         Thread t = Thread.currentThread();
         testNode.setThreadName(t.getName());
         testNodeList.add(testNode);

         if (node.getPass() != null) {
            performDFS(node.getPass());
         }
         if (node.getFail() != null) {
             Callable<List<List<TestNode>>> task = new DFS(service, node.getFail(),         
             this.testNodeList);
             Future<List<List<TestNode>>> returnList = service.submit(task);
             try {
                 returnNodeList.addAll(returnList.get());
             }
             catch (InterruptedException e) {
             }
             catch (ExecutionException e) {
             }
       }
    }

}

Main Class

    public static void main(String[] args) {
          Main main = new Main();
          Node root = main.createTree();
          ExecutorService service = Executors.newCachedThreadPool();
          Callable<List<List<TestNode>>> task = new DFS(service, root, null);

          Future<List<List<TestNode>>> returnList = null;
          try {
             returnList = service.submit(task);
         }
         catch (Exception e) {
         }
         try {
            main.displayTestNode(returnList.get());
            service.shutdown();
         }
         catch (InterruptedException e) {
        }
        catch (ExecutionException e) {
      }
   }

Questions

Does this make sense? Is this even possible?
There is problem with the implementation, as I can see the same Thread again and again

回答1:

Yes, it is possible to write a parallel DFS. It might also be possible using thread pools, but a fork/join-style algorithm would I think be more "natural". The fork operation would traverse all children of a node in parallel, while the join operation would simply concatenate the lists of paths returned.

回答2:

Well, I think the biggest problem with this is that multithreading won't help you, since you never execute anything in parallel. You spawn one thread, then immediately wait, so only one thread can be computing anything at one moment.

Another question to ask would be why (and if) you want the list of all paths at all. In a tree, paths from the root are uniquely identified by their endpoint (and can be reconstructed from there by following the "up" links). Moreover, the list of all paths has bigger memory complexity (in a full balanced binary tree with N nodes, the memory complexity of the tree is O(N), whereas the memory complexity of the list of all paths would be O(N*log(N)), which would be the time complexity of its creation, too). Even if the "up" links are not provided in a tree, you can just reconstruct them (maybe as an IdentitiyHashMap<Node, Node>) in O(N) time, which will take you less time (and memory!) than constructing the list of all paths. I really don't think you should begin processing large trees by converting them to a wasteful representation.

来源：https://stackoverflow.com/questions/10258204/depth-first-search-in-parallel

标签

java

multithreading

algorithm

depth-first-search