Job Interview Question Using Trees, What data to save?

泄露秘密 提交于 2021-01-21 11:38:04

问题


I was solving the following job interview question and solved most of it but failed at the last requirement.

Q: Build a data structure which supports the following functions:

Init - Initialise Empty DS. O(1) Time complexity.

SetPositiveInDay(d,x) - Add to the DS that in day d exactly x new people were infected with covid-19. O(log n)Time complexity.

WorseBefore(d) - From the days inserted into the DS and smaller than d return the last one which has more newly infected people than d. O(log n)Time complexity.

For example:

Init()
SetPositiveInDay(1,10)
SetPositiveInDay(2,20)
SetPositiveInDay(3,15)
SetPositiveInDay(5,17)
SetPositiveInDay(23,180)
SetPositiveInDay(8,13)
SetPositiveInDay(13,18)
WorstBefore(13) // Returns day #2
SetPositiveInDay(10,19)
WorstBefore(13) // Returns day #10

Important note: you can't suppose that days will be entered by order and can't suppose too that there won't be "gaps" between days. (Some days may not be saved in the DS while those after it may be).


What I did?

I used AVL tree (I could use 2-3 tree too). For each node I have:

Sick - Number of new infected people in that day.

maxLeftSick - Max number of infected people for left son.

maxRightSick - Max number of infected people for right son.

When inserted a new node I made sure that in rotation data won't get missed plus, for each single node from the new one till the root I did:

But I wasn't successful implementing WorseBefore(d).


回答1:


Where to search?

First you need to find the node node corresponding to d in the tree ordered by days. Let x = Sick(node). This can be done in O(log n).

If maxLeftSick(node) > x, the solution must be in the left subtree of node. Search for the solution there and return the answer. This can be done in O(log n) - see below.

Otherwise, traverse the tree upwards towards the root, starting from node, until you find the first node nextPredecessor satisfying this property (this takes O(log n)):

  • nextPredecessor is smaller than node,
  • and either
    1. Sick(nextPredecessor) > x or
    2. maxLeftSick(nextPredecessor) > x.

If no such node exists, we give up. In case 1, just return nextPredecessor since that is the best solution.

In case 2, we know that the solution must be in the left subtree of nextPredecessor, so search there and return the answer. Again, this takes O(log n) - see below.


Note that there is no need to search in the right subtree of nextPredecessor since the only nodes that are smaller than node in that subtree would be the left subtree of node itself, and we have already excluded that.

Note also that it is not necessary to traverse further up the tree than nextPredecessor since those nodes are even smaller, and we are looking for the largest node satisfying all constraints.


How to search?

OK, so how do we search for the solution in a subtree? Finding the largest day within a subtree rooted in q that is worse than an infection number x is simple using the maxLeftSick and maxRightSick information:

  1. If q has a right child and maxRightSick(q) > x then search in the right subtree of q.
  2. If q has no right child and Sick(q) > x, return Day(q).
  3. If q has a left child and maxLeftSick(q) > x then search in the left subtree of q.
  4. Otherwise there is no solution within the subtree q.

We are effectively using maxLeftSick and maxRightSick to prune the search tree to include only "worse" nodes, and within that pruned tree we get the right most node, i.e. the one with the largest day.

It is easy to see that this algorithm runs in O(log n) where n is the total number of nodes since the number of steps is bounded by the height of the tree.

Pseudocode

Here is the pseudocode (assuming maxLeftSick and maxRightSick return -1 if no corresponding child node exists):


// Returns the largest day smaller than d such that its 
// infection number is larger than the infection number on day d.
// Returns -1 if no such day exists.
int WorstBefore(int d) {
    node = find(d);
    
    // try to find the solution in the left subtree
    if (maxLeftSick(node) > Sick(node)) {
        return FindLastWorseThan(node -> left, Sick(node));
    }
    // move up towards root until we find the first node
    // that is smaller than `node` and such that
    // Sick(nextPredecessor) > Sick(node) or 
    // maxLeftSick(nextPredecessor) > Sick(node).
    nextPredecessor = findNextPredecessor(node);
    if (nextPredecessor == null) return -1;

    // Case 1
    if (Sick(nextPredecessor) > Sick(node)) return nextPredecessor;
    
    // Case 2: maxLeftSick(nextPredecessor) > Sick(node)
    return FindLastWorseThan(nextPredecessor -> left, Sick(node));
}

// Finds the latest day within the given subtree with root "node" where
// the infection number is larger than x. Runs in O(log(size(q)).
int FindLastWorseThan(Node q, int x) {
    if ((q -> right) = null and Sick(q) > x) return Day(q);
    if (maxRightSick(q) > x) return FindLastWorseThan(q -> right, x);
    if (maxLeftSick(q) > x) return FindLastWorseThan(q -> left, x);
    return -1;
}



回答2:


First of all, your chosen data structure looks fine to me. You did not mention it explicitly, but I assume that the "key" you use in the AVL tree is the day number, i.e. an in-order traversal of the tree would list the nodes in their chronological order.

I would just suggest a cosmetic change: store the maximum value of sick in the node itself, so that you don't have two similar informations (maxLeftSick and maxRightSick) stored in one node instance, but move those two informations to the child nodes, so that your node.maxLeftSick is actually stored in node.left.maxSick, and similarly node.maxRightSick is stored in node.right.maxSick. This is of course not done when that child does not exist, but then we don't need that information either. In your structure maxLeftSick would be 0 when left is not defined. In my proposed structure, you would not have that value -- the 0 would follow naturally from the fact that there is no left child. In my proposal, the root node would have an information in maxSick which is not present in yours, and which would be the sum of your root.maxLeftSick and root.maxRightSick. This information would not really be used, but it is just there to make the structure consistent throughout the tree.

So you would just store one maxSick, which considers the current node's sick value also in that maximum. The processing you do during rotations will need to change accordingly, but will not become more complex.

I will assume that your AVL tree is single-threaded, i.e. you don't keep track of parent-pointers. So create a find method which will return the path to the node to be found. For instance, in Python syntax, it could look like this:

def find(self, day):
    node = self.root
    path = []  # an array of nodes
    while node:
        path.append(node)
        if node.day == day:  # bingo
            return path
        if day < node.day:
            node = node.left
        else:
            node = node.right

Then the worstBefore method could look like this:

def worstBefore(self, day):
    path = self.find(day)
    if not path:
        return  # day not found
    # get number of sick people on that day:
    sick = path[-1].sick
    # look for recent day with greater number of sick
    while path:
        node = path.pop()  # walk upward, starting with found node
        if node.day < day and node.sick > sick:
            return node.day
        if node.left and node.left.maxSick > sick:
            # we will find the result in this subtree
            node = node.left
            while True:
                if node.right and node.right.maxSick > sick:
                    node = node.right
                elif node.sick > sick:  # bingo
                    return node.day
                else:
                    node = node.left

So the path returned by the find method will be used to get the parents of a node when you need to backtrack upwards in the tree along that path.

If along that path you find a left child whose maxSick is greater, then you know that the targeted node must be in that subtree. It is then a matter to walk down that subtree in a controlled way, choosing the right child when it still has maxSick greater. Otherwise check the current node's sick value and return that one if that value is greater. Otherwise go left, and repeat.

While there is no such left sub tree, go up along the path. If that parent would be a match, then return it (make sure to verify the day number). Keep checking for left sub trees that have a larger maxSick.

This runs in O(logn) because you first will walk zero or more steps upward and then zero or more steps downward (in a left subtree).

You can see your example scenario run on repl.it. There I focussed on this question, and didn't implement the rotations.



来源:https://stackoverflow.com/questions/65368238/job-interview-question-using-trees-what-data-to-save

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!