Fastest way to create strictly increasing lists in Python

后端 未结 5 2116
南方客
南方客 2021-02-07 09:03

I would like to find out what is the most efficient way to achieve the following in Python:

Suppose we have two lists a and b which are of equa

5条回答
  •  难免孤独
    2021-02-07 09:03

    Here is a vanilla Python solution that does one pass:

    >>> a = [2,1,2,3,4,5,4,6,5,7,8,9,8,10,11]
    >>> b = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
    >>> a_new, b_new = [], []
    >>> last = float('-inf')
    >>> for x, y in zip(a, b):
    ...     if x > last:
    ...         last = x
    ...         a_new.append(x)
    ...         b_new.append(y)
    ...
    >>> a_new
    [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
    >>> b_new
    [1, 4, 5, 6, 8, 10, 11, 12, 14, 15]
    

    I'm curious to see how it compares to the numpy solution, which will have similar time complexity but does a couple of passes on the data.

    Here are some timings. First, setup:

    >>> small = ([2,1,2,3,4,5,4,6,5,7,8,9,8,10,11], [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
    >>> medium = (np.random.randint(1, 10000, (10000,)), np.random.randint(1, 10000, (10000,)))
    >>> large = (np.random.randint(1, 10000000, (10000000,)), np.random.randint(1, 10000000, (10000000,)))
    

    And now the two approaches:

    >>> def monotonic(a, b):
    ...     a_new, b_new = [], []
    ...     last = float('-inf')
    ...     for x,y in zip(a,b):
    ...         if x > last:
    ...             last = x
    ...             a_new.append(x)
    ...             b_new.append(y)
    ...     return a_new, b_new
    ...
    >>> def np_monotonic(a, b):
    ...     a_new, idx = np.unique(np.maximum.accumulate(a), return_index=True)
    ...     return a_new, b[idx]
    ...
    

    Note, the approaches are not strictly equivalent, one stays in vanilla Python land, the other stays in numpy array land. We will compare performance assuming you are starting with the corresponding data structure (either numpy.array or list):

    So first, a small list, the same from the OP's example, we see that numpy is not actually faster, which isn't surprising for small data structures:

    >>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, small; a, b = small", number=10000)
    0.039130652003223076
    >>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, small, np; a, b = np.array(small[0]), np.array(small[1])", number=10000)
    0.10779813499539159
    

    Now a "medium" list/array of 10,000 elements, we start to see numpy advantages:

    >>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, medium; a, b = medium[0].tolist(), medium[1].tolist()", number=10000)
    4.642718859016895
    >>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, medium; a, b = medium", number=10000)
    1.3776302759943064
    

    Now, interestingly, the advantage seems to narrow with "large" arrays, on the order of 1e7 elements:

    >>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, large; a, b = large[0].tolist(), large[1].tolist()", number=10)
    4.400254560023313
    >>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, large; a, b = large", number=10)
    3.593393853981979
    

    Note, in the last pair of timings, I only did them 10 times each, but if someone has a better machine or more patience, please feel free to increase number

提交回复
热议问题