Python: split a list based on a condition?

前端 未结 30 1828
误落风尘
误落风尘 2020-11-22 06:56

What\'s the best way, both aesthetically and from a performance perspective, to split a list of items into multiple lists based on a conditional? The equivalent of:

相关标签:
30条回答
  • 2020-11-22 07:06

    This is the fastest way.

    It uses if else, (like dbr's answer) but creates a set first. A set reduces the number of operations from O(m * n) to O(log m) + O(n), resulting in a 45%+ boost in speed.

    good_list_set = set(good_list)  # 45% faster than a tuple.
    
    good, bad = [], []
    for item in my_origin_list:
        if item in good_list_set:
            good.append(item)
        else:
            bad.append(item)
    

    A little shorter:

    good_list_set = set(good_list)  # 45% faster than a tuple.
    
    good, bad = [], []
    for item in my_origin_list:
        out = good if item in good_list_set else bad
        out.append(item)
    

    Benchmark results:

    filter_BJHomer                  80/s       --   -3265%   -5312%   -5900%   -6262%   -7273%   -7363%   -8051%   -8162%   -8244%
    zip_Funky                       118/s    4848%       --   -3040%   -3913%   -4450%   -5951%   -6085%   -7106%   -7271%   -7393%
    two_lst_tuple_JohnLaRoy         170/s   11332%    4367%       --   -1254%   -2026%   -4182%   -4375%   -5842%   -6079%   -6254%
    if_else_DBR                     195/s   14392%    6428%    1434%       --    -882%   -3348%   -3568%   -5246%   -5516%   -5717%
    two_lst_compr_Parand            213/s   16750%    8016%    2540%     967%       --   -2705%   -2946%   -4786%   -5083%   -5303%
    if_else_1_line_DanSalmo         292/s   26668%   14696%    7189%    5033%    3707%       --    -331%   -2853%   -3260%   -3562%
    tuple_if_else                   302/s   27923%   15542%    7778%    5548%    4177%     343%       --   -2609%   -3029%   -3341%
    set_1_line                      409/s   41308%   24556%   14053%   11035%    9181%    3993%    3529%       --    -569%    -991%
    set_shorter                     434/s   44401%   26640%   15503%   12303%   10337%    4836%    4345%     603%       --    -448%
    set_if_else                     454/s   46952%   28358%   16699%   13349%   11290%    5532%    5018%    1100%     469%       --
    

    The full benchmark code for Python 3.7 (modified from FunkySayu):

    good_list = ['.jpg','.jpeg','.gif','.bmp','.png']
    
    import random
    import string
    my_origin_list = []
    for i in range(10000):
        fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
        if random.getrandbits(1):
            fext = random.choice(list(good_list))
        else:
            fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))
    
        my_origin_list.append((fname + fext, random.randrange(1000), fext))
    
    # Parand
    def two_lst_compr_Parand(*_):
        return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
    
    # dbr
    def if_else_DBR(*_):
        a, b = list(), list()
        for e in my_origin_list:
            if e[2] in good_list:
                a.append(e)
            else:
                b.append(e)
        return a, b
    
    # John La Rooy
    def two_lst_tuple_JohnLaRoy(*_):
        a, b = list(), list()
        for e in my_origin_list:
            (b, a)[e[2] in good_list].append(e)
        return a, b
    
    # # Ants Aasma
    # def f4():
    #     l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
    #     return [i for p, i in l1 if p], [i for p, i in l2 if not p]
    
    # My personal way to do
    def zip_Funky(*_):
        a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
        return list(filter(None, a)), list(filter(None, b))
    
    # BJ Homer
    def filter_BJHomer(*_):
        return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list,                                                                             my_origin_list))
    
    # ChaimG's answer; as a list.
    def if_else_1_line_DanSalmo(*_):
        good, bad = [], []
        for e in my_origin_list:
            _ = good.append(e) if e[2] in good_list else bad.append(e)
        return good, bad
    
    # ChaimG's answer; as a set.
    def set_1_line(*_):
        good_list_set = set(good_list)
        good, bad = [], []
        for e in my_origin_list:
            _ = good.append(e) if e[2] in good_list_set else bad.append(e)
        return good, bad
    
    # ChaimG set and if else list.
    def set_shorter(*_):
        good_list_set = set(good_list)
        good, bad = [], []
        for e in my_origin_list:
            out = good if e[2] in good_list_set else bad
            out.append(e)
        return good, bad
    
    # ChaimG's best answer; if else as a set.
    def set_if_else(*_):
        good_list_set = set(good_list)
        good, bad = [], []
        for e in my_origin_list:
            if e[2] in good_list_set:
                good.append(e)
            else:
                bad.append(e)
        return good, bad
    
    # ChaimG's best answer; if else as a set.
    def tuple_if_else(*_):
        good_list_tuple = tuple(good_list)
        good, bad = [], []
        for e in my_origin_list:
            if e[2] in good_list_tuple:
                good.append(e)
            else:
                bad.append(e)
        return good, bad
    
    def cmpthese(n=0, functions=None):
        results = {}
        for func_name in functions:
            args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
            t = Timer(*args)
            results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec
    
        functions_sorted = sorted(functions, key=results.__getitem__)
        for f in functions_sorted:
            diff = []
            for func in functions_sorted:
                if func == f:
                    diff.append("--")
                else:
                    diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
            diffs = " ".join(f'{x:>8s}' for x in diff)
    
            print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")
    
    
    if __name__=='__main__':
        from timeit import Timer
    cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))
    
    0 讨论(0)
  • 2020-11-22 07:06
    images = [f for f in files if f[2].lower() in IMAGE_TYPES]
    anims  = [f for f in files if f not in images]
    

    Nice when the condition is longer, such as in your example. The reader doesn't have to figure out the negative condition and whether it captures all other cases.

    0 讨论(0)
  • 2020-11-22 07:07

    Sometimes you won't need that other half of the list. For example:

    import sys
    from itertools import ifilter
    
    trustedPeople = sys.argv[1].split(',')
    newName = sys.argv[2]
    
    myFriends = ifilter(lambda x: x.startswith('Shi'), trustedPeople)
    
    print '%s is %smy friend.' % (newName, newName not in myFriends 'not ' or '')
    
    0 讨论(0)
  • 2020-11-22 07:08
    good.append(x) if x in goodvals else bad.append(x)
    

    This elegant and concise answer by @dansalmo showed up buried in the comments, so I'm just reposting it here as an answer so it can get the prominence it deserves, especially for new readers.

    Complete example:

    good, bad = [], []
    for x in my_list:
        good.append(x) if x in goodvals else bad.append(x)
    
    0 讨论(0)
  • 2020-11-22 07:08

    Personally, I like the version you cited, assuming you already have a list of goodvals hanging around. If not, something like:

    good = filter(lambda x: is_good(x), mylist)
    bad = filter(lambda x: not is_good(x), mylist)
    

    Of course, that's really very similar to using a list comprehension like you originally did, but with a function instead of a lookup:

    good = [x for x in mylist if is_good(x)]
    bad  = [x for x in mylist if not is_good(x)]
    

    In general, I find the aesthetics of list comprehensions to be very pleasing. Of course, if you don't actually need to preserve ordering and don't need duplicates, using the intersection and difference methods on sets would work well too.

    0 讨论(0)
  • 2020-11-22 07:08

    If you insist on clever, you could take Winden's solution and just a bit spurious cleverness:

    def splay(l, f, d=None):
      d = d or {}
      for x in l: d.setdefault(f(x), []).append(x)
      return d
    
    0 讨论(0)
提交回复
热议问题