Creating a list within a list in Python

前端 未结 5 1270
挽巷
挽巷 2021-01-21 05:48

I have a list named values containing a series of numbers:

values = [0, 1, 2, 3, 4, 5, ... , 351, 0, 1, 2, 3, 4, 5, 6, ... , 750, 0, 1, 2, 3, 4, 5, ... , 559]
         


        
5条回答
  •  [愿得一人]
    2021-01-21 06:31

    So your problem with the code as written is that it includes an empty list at the beginning, and omits the final sub-list. The minimalist fix for this is:

    1. Change the test to avoid appending the first list (when i is 0), e.g. if val == 0 and i != 0:

    2. Append the final group after the loop exits

    Combining the two fixes, you'd have:

    start = 0
    new_values = []
    for i,val in enumerate(values): 
        if val == 0 and i != 0:  # Avoid adding empty list
            new_values.append(values[start:i]) 
            start = i
    if values:  # Handle edgecase for empty values where nothing to add
        new_values.append(values[start:])  # Add final list
    

    I was going to add the cleaner groupby solution which avoids the special cases for beginning/end of list, but Chris_Rands already handled that, so I'll refer you to his answer.

    Somewhat surprisingly, this actually seems to be the fastest solution, asymptotically, at the expense of requiring the input to be a list (where some of the other solutions can accept arbitrary iterables, including pure iterators for which indexing is impossible).

    For comparison (using Python 3.5 additional unpacking generalizations both for brevity and to get optimal performance on modern Python, and using the implicit booleanness of int to avoid comparing to 0 since it's equivalent for int input, but meaningfully faster to use implicit booleanness):

    from itertools import *
    
    # truth is the same as bool, but unlike the bool constructor, it requires
    # exactly one positional argument, which makes a *major* difference
    # on runtime when it's in a hot code path
    from operator import truth
    
    def method1(values):
        # Optimized/correct OP's code
        # Only works on list inputs, and requires non-empty values to begin with 0,
        # but handles repeated 0s as separate groups properly
        new_values = []
        start = None
        for i, val in enumerate(values):
            if not val and i:
                new_values.append(values[start:i])
                start = i
        if values:
            new_values.append(values[start:])
        return new_values
    
    def method2(values):
        # Works with arbitrary iterables and iterators, but doesn't handle
        # repeated 0s or non-empty values that don't begin with 0
        return [[0, *g] for k, g in groupby(values, truth) if k]
    
    def method3(values):
        # Same behaviors and limitations as method1, but without verbose
        # special casing for begin and end
        start_indices = [i for i, val in enumerate(values) if not val]
    
        # End indices for all but terminal slice are previous start index
        # so make iterator and discard first value to pair properly
        end_indices = iter(start_indices)
        next(end_indices, None)
    
        # Pairing with zip_longest avoids need to explicitly pad end_indices
        return [values[s:e] for s, e in zip_longest(start_indices, end_indices)]
    
    def method4(values):
        # Requires any non-empty values to begin with 0
        # but otherwise handles runs of 0s and arbitrary iterables (including iterators)
        new_values = []
        for val in values:
            if not val:
                curlist = [val]
                new_values.append(curlist)
                # Use pre-bound method in local name for speed
                curlist_append = curlist.append
            else:
                curlist_append(val)
        return new_values
    
    def method5(values):
        # Most flexible solution; similar to method2, but handles all inputs, empty, non-empty,
        # with or without leading 0, with or without runs of repeated 0s
        new_values = []
        for nonzero, grp in groupby(values, truth):
            if nonzero:
                try:
                    new_values[-1] += grp
                except IndexError:
                    new_values.append([*grp])  # Only happens when values begins with nonzero
            else:
                new_values += [[0] for _ in grp]
        return new_values
    

    Timings on Python 3.6, Linux x64, using ipython 6.1's %timeit magic:

    >>> values = [*range(100), *range(50), *range(150)]
    >>> %timeit -r5 method1(values)
    12.5 μs ± 50.6 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
    
    >>> %timeit -r5 method2(values)
    16.9 μs ± 54.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
    
    >>> %timeit -r5 method3(values)
    13 μs ± 18.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
    
    >>> %timeit -r5 method4(values)
    16.7 μs ± 9.51 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
    
    >>> %timeit -r5 method5(values)
    18.2 μs ± 25.2 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
    

    Summary:

    Solutions that slice out the runs in bulk (method1, method3) are the fastest, but depend on the input being a sequence (and if the return type must be list, the input must be list too, or conversions must be added).

    groupby solutions (method2, method5) are a little slower, but are typically quite succinct (handling all edges cases as in method5 doesn't require extreme verbosity nor explicit test-and-check LBYL patterns). They also don't require a lot of hackery to make them go as fast as possible, aside from using operator.truth instead of bool. This is necessary because CPython's bool constructor is very slow thanks to some weird implementation details (bool must accept full varargs, including keywords, dispatching through the object construction machinery, which costs a lot more than operator.truth which uses a low overhead path that takes exactly one positional argument and bypasses object construction machinery); if bool were used as the key function instead of operator.truth, runtimes more than double (to 36.8 μs and 38.8 μs for method2 and method5 respectively).

    In between is the slower, but more flexible approach (handles arbitrary input iterables, including iterators, handles runs of 0s with no special casing, etc.) using item-by-item appends (method4). Problem is, getting maximum performance requires much more verbose code (because of the need to avoid repeated indexing and method binding); if the loop of method4 is changed to the much more succinct:

    for val in values:
        if not val:
            new_values.append([])
        new_values[-1].append(val)
    

    the runtime more than doubles (to ~34.4 μs), thanks to the cost of repeatedly indexing new_values and binding the append method over and over.

    In any event, personally, if performance wasn't absolutely critical, I'd use one of the groupby solutions using bool as the key just to avoid imports and uncommon APIs. If performance was more important, I'd probably still use groupby, but swap in operator.truth as the key function; sure, it's not as fast as the spelled out version, but for people who know groupby, it's easy enough to follow, and it's generally the most succinct solution for any given level of edge case handling.

提交回复
热议问题