问题
I was solving the following job interview question and solved most of it but failed at the last requirement.
Q: Build a data structure which supports the following functions:
Init
- Initialise Empty DS. O(1) Time complexity.
SetPositiveInDay(d,x)
- Add to the DS that in day d
exactly x
new people were infected with covid-19. O(log n)Time complexity.
WorseBefore(d)
- From the days inserted into the DS and smaller than d return the last one which has more newly infected people than d. O(log n)Time complexity.
For example:
Init()
SetPositiveInDay(1,10)
SetPositiveInDay(2,20)
SetPositiveInDay(3,15)
SetPositiveInDay(5,17)
SetPositiveInDay(23,180)
SetPositiveInDay(8,13)
SetPositiveInDay(13,18)
WorstBefore(13) // Returns day #2
SetPositiveInDay(10,19)
WorstBefore(13) // Returns day #10
Important note: you can't suppose that days will be entered by order and can't suppose too that there won't be "gaps" between days. (Some days may not be saved in the DS while those after it may be).
What I did?
I used AVL tree (I could use 2-3 tree too). For each node I have:
Sick
- Number of new infected people in that day.
maxLeftSick
- Max number of infected people for left son.
maxRightSick
- Max number of infected people for right son.
When inserted a new node I made sure that in rotation data won't get missed plus, for each single node from the new one till the root I did:
But I wasn't successful implementing WorseBefore(d)
.
回答1:
Where to search?
First you need to find the node node
corresponding to d
in the tree ordered by days. Let x = Sick(node)
. This can be done in O(log n).
If maxLeftSick(node) > x
, the solution must be in the left subtree of node
. Search for the solution there and return the answer. This can be done in O(log n) - see below.
Otherwise, traverse the tree upwards towards the root, starting from node
, until you find the first node nextPredecessor
satisfying this property (this takes O(log n)):
nextPredecessor
is smaller thannode
,- and either
Sick(nextPredecessor) > x
ormaxLeftSick(nextPredecessor) > x
.
If no such node exists, we give up. In case 1, just return nextPredecessor
since that is the best solution.
In case 2, we know that the solution must be in the left subtree of nextPredecessor
, so search there and return the answer. Again, this takes O(log n) - see below.
Note that there is no need to search in the right subtree of nextPredecessor
since the only nodes that are smaller than node
in that subtree would be the left subtree of node
itself, and we have already excluded that.
Note also that it is not necessary to traverse further up the tree than nextPredecessor
since those nodes are even smaller, and we are looking for the largest node satisfying all constraints.
How to search?
OK, so how do we search for the solution in a subtree? Finding the largest day within a subtree rooted in q
that is worse than an infection number x
is simple using the maxLeftSick
and maxRightSick
information:
- If
q
has a right child andmaxRightSick(q) > x
then search in the right subtree ofq
. - If
q
has no right child andSick(q) > x
, returnDay(q)
. - If
q
has a left child andmaxLeftSick(q) > x
then search in the left subtree ofq
. - Otherwise there is no solution within the subtree
q
.
We are effectively using maxLeftSick
and maxRightSick
to prune the search tree to include only "worse" nodes, and within that pruned tree we get the right most node, i.e. the one with the largest day.
It is easy to see that this algorithm runs in O(log n)
where n
is the total number of nodes since the number of steps is bounded by the height of the tree.
Pseudocode
Here is the pseudocode (assuming maxLeftSick
and maxRightSick
return -1 if no corresponding child node exists):
// Returns the largest day smaller than d such that its
// infection number is larger than the infection number on day d.
// Returns -1 if no such day exists.
int WorstBefore(int d) {
node = find(d);
// try to find the solution in the left subtree
if (maxLeftSick(node) > Sick(node)) {
return FindLastWorseThan(node -> left, Sick(node));
}
// move up towards root until we find the first node
// that is smaller than `node` and such that
// Sick(nextPredecessor) > Sick(node) or
// maxLeftSick(nextPredecessor) > Sick(node).
nextPredecessor = findNextPredecessor(node);
if (nextPredecessor == null) return -1;
// Case 1
if (Sick(nextPredecessor) > Sick(node)) return nextPredecessor;
// Case 2: maxLeftSick(nextPredecessor) > Sick(node)
return FindLastWorseThan(nextPredecessor -> left, Sick(node));
}
// Finds the latest day within the given subtree with root "node" where
// the infection number is larger than x. Runs in O(log(size(q)).
int FindLastWorseThan(Node q, int x) {
if ((q -> right) = null and Sick(q) > x) return Day(q);
if (maxRightSick(q) > x) return FindLastWorseThan(q -> right, x);
if (maxLeftSick(q) > x) return FindLastWorseThan(q -> left, x);
return -1;
}
回答2:
First of all, your chosen data structure looks fine to me. You did not mention it explicitly, but I assume that the "key" you use in the AVL tree is the day number, i.e. an in-order traversal of the tree would list the nodes in their chronological order.
I would just suggest a cosmetic change: store the maximum value of sick
in the node itself, so that you don't have two similar informations (maxLeftSick
and maxRightSick
) stored in one node instance, but move those two informations to the child nodes, so that your node.maxLeftSick
is actually stored in node.left.maxSick
, and similarly node.maxRightSick
is stored in node.right.maxSick
. This is of course not done when that child does not exist, but then we don't need that information either. In your structure maxLeftSick
would be 0 when left
is not defined. In my proposed structure, you would not have that value -- the 0 would follow naturally from the fact that there is no left
child. In my proposal, the root node would have an information in maxSick
which is not present in yours, and which would be the sum of your root.maxLeftSick
and root.maxRightSick
. This information would not really be used, but it is just there to make the structure consistent throughout the tree.
So you would just store one maxSick
, which considers the current node's sick
value also in that maximum. The processing you do during rotations will need to change accordingly, but will not become more complex.
I will assume that your AVL tree is single-threaded, i.e. you don't keep track of parent-pointers. So create a find
method which will return the path to the node to be found. For instance, in Python syntax, it could look like this:
def find(self, day):
node = self.root
path = [] # an array of nodes
while node:
path.append(node)
if node.day == day: # bingo
return path
if day < node.day:
node = node.left
else:
node = node.right
Then the worstBefore
method could look like this:
def worstBefore(self, day):
path = self.find(day)
if not path:
return # day not found
# get number of sick people on that day:
sick = path[-1].sick
# look for recent day with greater number of sick
while path:
node = path.pop() # walk upward, starting with found node
if node.day < day and node.sick > sick:
return node.day
if node.left and node.left.maxSick > sick:
# we will find the result in this subtree
node = node.left
while True:
if node.right and node.right.maxSick > sick:
node = node.right
elif node.sick > sick: # bingo
return node.day
else:
node = node.left
So the path returned by the find
method will be used to get the parents of a node when you need to backtrack upwards in the tree along that path.
If along that path you find a left child whose maxSick
is greater, then you know that the targeted node must be in that subtree. It is then a matter to walk down that subtree in a controlled way, choosing the right child when it still has maxSick
greater. Otherwise check the current node's sick
value and return that one if that value is greater. Otherwise go left, and repeat.
While there is no such left sub tree, go up along the path. If that parent would be a match, then return it (make sure to verify the day number). Keep checking for left sub trees that have a larger maxSick
.
This runs in O(logn) because you first will walk zero or more steps upward and then zero or more steps downward (in a left subtree).
You can see your example scenario run on repl.it. There I focussed on this question, and didn't implement the rotations.
来源:https://stackoverflow.com/questions/65368238/job-interview-question-using-trees-what-data-to-save