Python: split a list based on a condition?

前端 未结 30 1869
误落风尘
误落风尘 2020-11-22 06:56

What\'s the best way, both aesthetically and from a performance perspective, to split a list of items into multiple lists based on a conditional? The equivalent of:

相关标签:
30条回答
  • 2020-11-22 07:01

    Yet another solution to this problem. I needed a solution that is as fast as possible. That means only one iteration over the list and preferably O(1) for adding data to one of the resulting lists. This is very similar to the solution provided by sastanin, except much shorter:

    from collections import deque
    
    def split(iterable, function):
        dq_true = deque()
        dq_false = deque()
    
        # deque - the fastest way to consume an iterator and append items
        deque((
          (dq_true if function(item) else dq_false).append(item) for item in iterable
        ), maxlen=0)
    
        return dq_true, dq_false
    

    Then, you can use the function in the following way:

    lower, higher = split([0,1,2,3,4,5,6,7,8,9], lambda x: x < 5)
    
    selected, other = split([0,1,2,3,4,5,6,7,8,9], lambda x: x in {0,4,9})
    

    If you're not fine with the resulting deque object, you can easily convert it to list, set, whatever you like (for example list(lower)). The conversion is much faster, that construction of the lists directly.

    This methods keeps order of the items, as well as any duplicates.

    0 讨论(0)
  • 2020-11-22 07:02

    Here's the lazy iterator approach:

    from itertools import tee
    
    def split_on_condition(seq, condition):
        l1, l2 = tee((condition(item), item) for item in seq)
        return (i for p, i in l1 if p), (i for p, i in l2 if not p)
    

    It evaluates the condition once per item and returns two generators, first yielding values from the sequence where the condition is true, the other where it's false.

    Because it's lazy you can use it on any iterator, even an infinite one:

    from itertools import count, islice
    
    def is_prime(n):
        return n > 1 and all(n % i for i in xrange(2, n))
    
    primes, not_primes = split_on_condition(count(), is_prime)
    print("First 10 primes", list(islice(primes, 10)))
    print("First 10 non-primes", list(islice(not_primes, 10)))
    

    Usually though the non-lazy list returning approach is better:

    def split_on_condition(seq, condition):
        a, b = [], []
        for item in seq:
            (a if condition(item) else b).append(item)
        return a, b
    

    Edit: For your more specific usecase of splitting items into different lists by some key, heres a generic function that does that:

    DROP_VALUE = lambda _:_
    def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
        """Split a sequence into lists based on a key function.
    
            seq - input sequence
            resultmapping - a dictionary that maps from target lists to keys that go to that list
            keyfunc - function to calculate the key of an input value
            default - the target where items that don't have a corresponding key go, by default they are dropped
        """
        result_lists = dict((key, []) for key in resultmapping)
        appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)
    
        if default is not DROP_VALUE:
            result_lists.setdefault(default, [])
            default_action = result_lists[default].append
        else:
            default_action = DROP_VALUE
    
        for item in seq:
            appenders.get(keyfunc(item), default_action)(item)
    
        return result_lists
    

    Usage:

    def file_extension(f):
        return f[2].lower()
    
    split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
    print split_files['images']
    print split_files['anims']
    
    0 讨论(0)
  • 2020-11-22 07:05

    I think a generalization of splitting a an iterable based on N conditions is handy

    from collections import OrderedDict
    def partition(iterable,*conditions):
        '''Returns a list with the elements that satisfy each of condition.
           Conditions are assumed to be exclusive'''
        d= OrderedDict((i,list())for i in range(len(conditions)))        
        for e in iterable:
            for i,condition in enumerate(conditions):
                if condition(e):
                    d[i].append(e)
                    break                    
        return d.values()
    

    For instance:

    ints,floats,other = partition([2, 3.14, 1, 1.69, [], None],
                                  lambda x: isinstance(x, int), 
                                  lambda x: isinstance(x, float),
                                  lambda x: True)
    
    print " ints: {}\n floats:{}\n other:{}".format(ints,floats,other)
    
     ints: [2, 1]
     floats:[3.14, 1.69]
     other:[[], None]
    

    If the element may satisfy multiple conditions, remove the break.

    0 讨论(0)
  • 2020-11-22 07:05

    Inspired by @gnibbler's great (but terse!) answer, we can apply that approach to map to multiple partitions:

    from collections import defaultdict
    
    def splitter(l, mapper):
        """Split an iterable into multiple partitions generated by a callable mapper."""
    
        results = defaultdict(list)
    
        for x in l:
            results[mapper(x)] += [x]
    
        return results
    

    Then splitter can then be used as follows:

    >>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
    >>> split = splitter(l, lambda x: x % 2 == 0)  # partition l into odds and evens
    >>> split.items()
    >>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]
    

    This works for more than two partitions with a more complicated mapping (and on iterators, too):

    >>> import math
    >>> l = xrange(1, 23)
    >>> split = splitter(l, lambda x: int(math.log10(x) * 5))
    >>> split.items()
    [(0, [1]),
     (1, [2]),
     (2, [3]),
     (3, [4, 5, 6]),
     (4, [7, 8, 9]),
     (5, [10, 11, 12, 13, 14, 15]),
     (6, [16, 17, 18, 19, 20, 21, 22])]
    

    Or using a dictionary to map:

    >>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
    >>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
    >>> split = splitter(l, map.get)
    >>> split.items()
    (1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]
    
    0 讨论(0)
  • 2020-11-22 07:05
    bad = []
    good = [x for x in mylist if x in goodvals or bad.append(x)]
    

    append returns None, so it works.

    0 讨论(0)
  • 2020-11-22 07:05

    If you don't mind using an external library there two I know that nativly implement this operation:

    >>> files = [ ('file1.jpg', 33, '.jpg'), ('file2.avi', 999, '.avi')]
    >>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
    
    • iteration_utilities.partition:

      >>> from iteration_utilities import partition
      >>> notimages, images = partition(files, lambda x: x[2].lower() in IMAGE_TYPES)
      >>> notimages
      [('file2.avi', 999, '.avi')]
      >>> images
      [('file1.jpg', 33, '.jpg')]
      
    • more_itertools.partition

      >>> from more_itertools import partition
      >>> notimages, images = partition(lambda x: x[2].lower() in IMAGE_TYPES, files)
      >>> list(notimages)  # returns a generator so you need to explicitly convert to list.
      [('file2.avi', 999, '.avi')]
      >>> list(images)
      [('file1.jpg', 33, '.jpg')]
      
    0 讨论(0)
提交回复
热议问题