Retrieve the two highest item from a list containing 100,000 integers

后端 未结 15 756
心在旅途
心在旅途 2020-12-13 17:55

How can retrieve the two highest item from a list containing 100,000 integers without having to sort the entire list first?

相关标签:
15条回答
  • 2020-12-13 18:31

    A really slick way is to use heapq. Heapify the array (O(n)), then just pop an many elements that you need (log(n)). (Saw this question in an interview once, good question to keep in mind.)

    0 讨论(0)
  • 2020-12-13 18:33

    JacobM's answer is absolutely the way to go. However, there are a few things to keep in mind while implementing what he described. Here's a little play-along-at-home tutorial to guide you through the trickier parts of solving this problem.

    If this code is meant for production use, please use one of the more efficient/concise answers listed. This answer is targetted at someone new to programming.

    The idea

    The idea is simple.

    • Keep two variables: largest and second_largest.
    • Go through the list.
      • If an item is greater than largest, assign it to largest.
      • If an item is greater than second_largest, but less than largest, assign it to second_largest.

    Getting started

    Let's start.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        for item in inlist:
            if item > largest:
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [3, 2, 1]
        print two_largest(inlist)
    

    Okay, we now have JacobM's answer as a Python function. What happens when we try to run it?

    Traceback (most recent call last):
      File "twol.py", line 10, in <module>
        print two_largest(inlist)
      File "twol.py", line 3, in two_largest
        if item > largest:
    UnboundLocalError: local variable 'largest' referenced before assignment
    

    Apparently, we need to set largest before we start the loop. This probably means we should set second_largest too.

    Initializing variables

    Let's set largest and second_largest to 0.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        largest = 0 # NEW!
        second_largest = 0 # NEW!
        for item in inlist:
            if item > largest:
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [3, 2, 1]
        print two_largest(inlist)
    

    Good. Let's run it.

    (3, 2)
    

    Great! Now let's test with inlist being [1, 2, 3]

        inlist = [1, 2, 3] # CHANGED!
    

    Let's try it.

    (3, 0)
    

    ...Uh oh.

    Fixing the logic

    The largest value (3) seems correct. The second-largest value is completely wrong though. What's going on?

    Let's work through what the function is doing.

    • When we start, largest is 0 and second_largest is also 0.
    • The first item in the list we look at is 1, so largest becomes 1.
    • The next item is 2, so largest becomes 2.

    But what about second_largest?

    When we assign a new value to largest, the largest value actually becomes second-largest. We need to show that in the code.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        largest = 0
        second_largest = 0
        for item in inlist:
            if item > largest:
                second_largest = largest # NEW!
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [1, 2, 3]
        print two_largest(inlist)
    

    Let's run it.

    (3, 2)
    

    Fantastic.

    Initializing variables, part 2

    Now let's try it with a list of negative numbers.

        inlist = [-1, -2, -3] # CHANGED!
    

    Let's run it.

    (0, 0)
    

    That's not right at all. Where did these zeroes come from?

    It turns out that the starting values for largest and second_largest were actually larger than all the items in the list. The first thing you might consider is setting largest and second_largest to the lowest values possible in Python. Unfortunately, Python doesn't have a smallest possible value. That means that, even if you set both of them to -1,000,000,000,000,000,000, you can have a list of values smaller than that.

    So what's the best thing to do? Let's try setting largest and second_largest to the first and second items in the list. Then, to avoid double-counting any items in the list, we only look at the part of the list after the second item.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        largest = inlist[0] # CHANGED!
        second_largest = inlist[1] # CHANGED!
        # Only look at the part of inlist starting with item 2
        for item in inlist[2:]: # CHANGED!
            if item > largest:
                second_largest = largest
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [-1, -2, -3]
        print two_largest(inlist)
    

    Let's run it.

    (-1, -2)
    

    Great! Let's try with another list of negative numbers.

        inlist = [-3, -2, -1] # CHANGED!
    

    Let's run it.

    (-1, -3)
    

    Wait, what?

    Initializing variables, part 3

    Let's step through our logic again.

    • largest is set to -3
    • second_largest is set to -2

    Wait right there. Already, this seems wrong. -2 is larger than -3. Is this what caused the problem? Let's continue.

    • largest is set to -1; second_largest is set to the old value of largest, which is -3

    Yes, that looks to be the problem. We need to ensure that largest and second_largest are set correctly.

    def two_largest(inlist):
        """Return the two largest items in the sequence. The sequence must
        contain at least two items."""
        if inlist[0] > inlist[1]: # NEW
            largest = inlist[0]
            second_largest = inlist[1]
        else: # NEW
            largest = inlist[1] # NEW
            second_largest = inlist[0] # NEW
        # Only look at the part of inlist starting with item 2
        for item in inlist[2:]:
            if item > largest:
                second_largest = largest
                largest = item
            elif largest > item > second_largest:
                second_largest = item
        # Return the results as a tuple
        return largest, second_largest
    
    # If we run this script, it will should find the two largest items and
    # print those
    if __name__ == "__main__":
        inlist = [-3, -2, -1]
        print two_largest(inlist)
    

    Let's run it.

    (-1, -2)
    

    Excellent.

    Conclusion

    So here's the code, nicely commented and formatted. It's also had all the bugs I could find beaten from it. Enjoy.

    However, assuming this really is a homework question, I hope you get some useful experience from seeing an imperfect piece of code slowly improved. I hope some of these techniques will be useful in future programming assignments.


    Efficiency

    Not very efficient. But for most purposes, it should be okay: on my computer (Core 2 Duo), a list of 100 000 items can be processed in 0.27 seconds (using timeit, averaged over 100 runs).

    0 讨论(0)
  • 2020-12-13 18:36

    The best time you can expect is linear, since you have to at least look through all the elements.

    Here is my pseudocode to solve the problem:

    //assume list has at least 2 elements
    (max, nextMax) = if (list[0] > list[1])
                     then (list[0], list[1])
                     else (list[1], list[0])
    
    for (2 <= i < length) {
        (max, nextMax) = if       (max < list[i])     => (list[i], max)
                         elseif   (nextMax < list[i]) => (max, list[i])
                         else     (no change)         => (max, nextMax)
    }
    
    return (max, nextMax)
    
    0 讨论(0)
提交回复
热议问题