All Paths for a Sum with return issues

问题

I have a question in finding the all paths for a sum. The question is:

Given a binary tree and a number ‘S’, find all paths from root-to-leaf such that the sum of all the node values of each path equals ‘S’.

My approach with recursion is:

def all_sum_path(root, target):
    result = []
    find_sum_path(root, target, result, [])
    return result

def find_sum_path(root, target, result, new_path):
    if not root:
        return None
    new_path.append(root.value)
    diff = target - root.value
    if not root.left and not root.right and diff == 0:
        # copy the value of the list rather than a reference
        result.append(list(new_path))
    if root.left:
        return find_sum_path(root.left, diff, result, new_path)
    if root.right:
        return find_sum_path(root.right, diff, result, new_path)
    del new_path[-1]


class TreeNode():
    def __init__(self, _value):
        self.value = _value
        self.left, self.right, self.next = None, None, None

def main():
    root = TreeNode(1)
    root.left = TreeNode(7)
    root.right = TreeNode(9)
    root.left.left = TreeNode(4)
    root.left.right = TreeNode(5)
    root.right.left = TreeNode(2)
    root.right.right = TreeNode(7)

    print(all_sum_path(root, 12))

    root = TreeNode(12)
    root.left = TreeNode(7)
    root.right = TreeNode(1)
    root.left.left = TreeNode(4)
    root.right.left = TreeNode(10)
    root.right.right = TreeNode(5)

    print(all_sum_path(root, 23))

main()

and the output is:

[[1, 7, 4]]
[[12, 7, 4]]

Process finished with exit code 0

However, the correct approach should be:

def all_sum_path(root, target):
    result = []
    find_sum_path(root, target, result, [])
    return result

def find_sum_path(root, target, result, new_path):
    if not root:
        return None
    new_path.append(root.value)
    diff = target - root.value
    if not root.left and not root.right and diff == 0:
        # copy the value of the list rather than a reference
        result.append(list(new_path))
    if root.left:
        find_sum_path(root.left, diff, result, new_path)
    if root.right:
        find_sum_path(root.right, diff, result, new_path)
    del new_path[-1]


class TreeNode():
    def __init__(self, _value):
        self.value = _value
        self.left, self.right, self.next = None, None, None

def main():
    root = TreeNode(1)
    root.left = TreeNode(7)
    root.right = TreeNode(9)
    root.left.left = TreeNode(4)
    root.left.right = TreeNode(5)
    root.right.left = TreeNode(2)
    root.right.right = TreeNode(7)

    print(all_sum_path(root, 12))

    root = TreeNode(12)
    root.left = TreeNode(7)
    root.right = TreeNode(1)
    root.left.left = TreeNode(4)
    root.right.left = TreeNode(10)
    root.right.right = TreeNode(5)

    print(all_sum_path(root, 23))

main()

With output:

[[1, 7, 4], [1, 9, 2]]
[[12, 7, 4], [12, 1, 10]]

Process finished with exit code 0

I have a some questions here:

Why don't we need a return in the recursion statement? I am also interested in how the return statement reduced the output to only one?
Why don't we need the result = find_sum_path(root, target, result, [])? Then what is the logic behind to update the results?
I am not sure why the time complexity is O(N^2)?

The time complexity of the above algorithm is O(N^2), where ‘N’ is the total number of nodes in the tree. This is due to the fact that we traverse each node once (which will take O(N)), and for every leaf node we might have to store its path which will take O(N).

Thanks for your help in advance.

回答1:

First off, I want to say you're very close to solving the problem and you've done a terrific job. Recursion is a functional heritage and so using it with functional style yields the best results. This means avoiding things like mutations, variable reassignments, and other side effects. This can eliminate many sources of bugs (and headaches)!

To spruce up your program, we can start by modifying TreeNode to accept both left and right parameters at time of construction -

class TreeNode():
  def __init__(self, value, left=None, right=None):
    self.value = value
    self.left = left
    self.right = right

Now we can define two trees, t1 and t2. Notice we don't reassign root -

def main():
  t1 = TreeNode \
    ( 1
    , TreeNode(7, TreeNode(4), TreeNode(5))
    , TreeNode(9, TreeNode(2), TreeNode(7))
    )
    
  t2 = TreeNode \
    ( 12
    , TreeNode(7, TreeNode(4), None)
    , TreeNode(1, TreeNode(10), TreeNode(5))
    )

  print(all_sum_path(t1, 12))
  print(all_sum_path(t2, 23))

main()

The expected output is -

[[1, 7, 4], [1, 9, 2]]
[[12, 7, 4], [12, 1, 10]]

Finally we implement find_sum. We can use mathematical induction to write a simple case analysis for our function -

if the input tree t is empty, return the empty result
(inductive) t is not empty. if t.value matches the target, q, a solution has been found; add t.value to the current path and yield
(inductive) t is not empty and t.value does not match the target q. add t.value to the current path and new sub-problem, next_q; solve the sub-problem on t.left and t.right branches -

def find_sum (t, q, path = []):
  if not t:
    return                        # (1)
  elif t.value == q:
    yield [*path, t.value]        # (2)
  else:
    next_q = q - t.value          # (3)
    next_path = [*path, t.value]
    yield from find_sum(t.left, next_q, next_path)
    yield from find_sum(t.right, next_q, next_path)

Notice how we don't use mutations like .append above. To compute all paths, we can write all_find_sum as the expansion of find_sum -

def all_sum_path (t, q):
  return list(find_sum(t, q))

And that's it, your program is done :D

If you don't want to use a separate generator find_sum, we can expand the generator in place -

def all_sum_path (t, q, path = []):
  if not t:
    return []
  elif t.value == q:
    return [[*path, t.value]]
  else:
    return \
      [ *all_sum_path(t.left, q - t.value, [*path, t.value])
      , *all_sum_path(t.right, q - t.value, [*path, t.value])
      ]

Notice the distinct similarity between the two variations. Any well-written program can easily be converted between either style.

回答2:

Why don't we need a return in the recursion statement?

Why don't we need the result = find_sum_path(root, target, result, [])? Then what is the logic behind to update the results?

The result list (and also the new_path list) is being passed through the recursion stacks by reference (or rather by assignment, see what does it mean by 'passed by assignment'?) which means the result variable always points to the same location in your memory as it was initialized to in all_sum_path (as long as it is not re-assigned) and you are able to mutate it in place as needed.

I am also interested in how the return statement reduced the output to only one?

When you use return in your solution, you are completely giving up on exploring right subtrees of a node when the left subtree is done.

if root.left: 
    return find_sum_path(root.left, diff, result, new_path)
# -- unreachable code if `root.left` is not `None` --
if root.right:
    return find_sum_path(root.right, diff, result, new_path)

I am not sure why the time complexity is O(N^2)?

if not root.left and not root.right and diff == 0:
    # copy the value of the list rather than a reference
    result.append(list(new_path))

This part of the code is making a full copy of the new_path to append it to result. Take the case of a binary tree which is somewhere between highly imbalanced and completely balanced, all nodes have values 0 and S is also 0. In such case, you'll make L (number of leaf nodes) copies of new_path with each containing up to H elements (height of the tree) so O(L * H) ~ O(N^2)

So the worst-case possible time complexity is certainly not linear O(N) but not completely O(N^2) either.

来源：https://stackoverflow.com/questions/65670647/all-paths-for-a-sum-with-return-issues

标签

python

recursion

return

binary-tree