So I have been looking into implementing a lowest common ancestor algorithm. I looked at many different algorithms (mainly variations of Trajan\'s solution or variations of the
Your algorithm is quadratic, but it can easily be made linear.
Just use hashtable (i.e. set) for parentNode
, instead of list.
Thus checking whether a node is in parentNode
will be O(1)
instead of O(n)
.
I just wrote a blog post about how I had to implement my own algorithm for this problem but extended to a set of nodes with an arbitrary length. You can find it here (with a step-by-step graphical explanation of how it works)
http://bio4j.com/blog/2012/02/finding-the-lowest-common-ancestor-of-a-set-of-ncbi-taxonomy-nodes-with-bio4j/
Cheers,
Pablo
For a tree that small, I wouldn't bother implementing anything more complex. Your solution looks good, although the time complexity is squared in terms of the height of the tree. If you can easily implement a Set
(most languages have it built-in), then the algorithm could be tweaked to,
Also, this algorithm assumes that a node can be its own ancestor. Otherwise, you would have to tweak the algorithm slightly. Consider this example,
A
|
B
|
C
When trying to find the lowest common ancestor of B, and C, this algorithm would report B, which may or may not be true depending on how you define ancestor.
Without looking at the details of either algorithm, I'd suggest looking at how important the efficiency of this algorithms is to your overall application, and how much effort would be required to implement another algorithm.
How many times will this algorithm be run in normal (or stressed) operation of your application? Will it cause the user to wait longer than necessary? Are the other algorithms of a different order of magnitude faster than yours? (Someone familiar with the algorithms can give you a more detailed answer on this.)
I think it is not worth optimising a bit of code unless you will see sizable results (some people feel quite strongly that premature optamisation is the root of all evil)
As others have mentioned, your algorithm is currently quadratic. That may not be a problem for a dataset as small as 50-75 nodes, but in any case it's straightforward to change it to linear time without using any sets or hashtables, just by recording the complete path to the root for each node, then walking back down from the root and looking for the first different node. The immediately preceding node (which is the common parent of these 2 different nodes) is then the LCA:
linearLCA(node1, node2) {
parentNode1 := [ ]
while (node1!=NULL) {
parentNode1.push(node1)
node1 := node1.parent
}
parentNode2 := [ ]
while (node2!=NULL) {
parentNode2.push(node2)
node2 := node2.parent
}
while (node1 == node2 && !isEmpty(parentNode1) && !isEmpty(parentNode2)) {
oldNode := node1
node1 := parentNode1.pop()
node2 := parentNode2.pop()
}
if (node1 == node2) return node1 // One node is descended from the other
else return oldNode // Neither is descended from the other
}
EDIT 27/5/2012: Handle the case in which one node is descended from the other, which would otherwise result in attempting to pop()
an empty stack. Thanks to damned for pointing this out. (I also realised that it's sufficient to track a single oldNode
.)
I have one simplistic solution sort the two elements and the lowest be left and highest be right visit root def recurse(root) return nil if root.empty? if left <= root && right >= root return root elsif left <= root && right <= root recurse(root.left) else recurse(root.right) end
So this will check on each traversal The problem soluted in O(log n) time for average and worst and O(log