python equivalent of filter() getting two output lists (i.e. partition of a list)

南笙酒味 提交于 2019-12-17 04:02:09

问题


Let's say I have a list, and a filtering function. Using something like

>>> filter(lambda x: x > 10, [1,4,12,7,42])
[12, 42]

I can get the elements matching the criterion. Is there a function I could use that would output two lists, one of elements matching, one of the remaining elements? I could call the filter() function twice, but that's kinda ugly :)

Edit: the order of elements should be conserved, and I may have identical elements multiple times.


回答1:


Try this:

def partition(pred, iterable):
    trues = []
    falses = []
    for item in iterable:
        if pred(item):
            trues.append(item)
        else:
            falses.append(item)
    return trues, falses

Usage:

>>> trues, falses = partition(lambda x: x > 10, [1,4,12,7,42])
>>> trues
[12, 42]
>>> falses
[1, 4, 7]

There is also an implementation suggestion in itertools recipes:

from itertools import filterfalse, tee

def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

The recipe comes from the Python 3.x documentation. In Python 2.x filterfalse is called ifilterfalse.




回答2:


>>> def partition(l, p):
...     return reduce(lambda x, y: (x[0]+[y], x[1]) if p(y) else (x[0], x[1]+[y]), l,  ([], []))
... 
>>> partition([1, 2, 3, 4, 5], lambda x: x < 3)
([1, 2], [3, 4, 5])

and a little uglier but faster version of the above code:

def partition(l, p):
    return reduce(lambda x, y: x[0].append(y) or x if p(y) else x[1].append(y) or x, l,  ([], []))

This is second edit, but I think it matters:

 def partition(l, p):
     return reduce(lambda x, y: x[not p(y)].append(y) or x, l, ([], []))

The second and the third are as quick as the iterative one upper but are less code.




回答3:


I think groupby might be more relevant here:

http://docs.python.org/library/itertools.html#itertools.groupby

For example, splitting a list into odd and even numbers (or could be an arbitrary number of groups):

>>> l=range(6)
>>> key=lambda x: x % 2 == 0
>>> from itertools import groupby
>>> {k:list(g) for k,g in groupby(sorted(l,key=key),key=key)}
    {False: [1, 3, 5], True: [0, 2, 4]}



回答4:


TL;DR

The accepted, most voted answer [1] by Mark Byers

def partition(pred, iterable):
    trues = []
    falses = []
    for item in iterable:
        if pred(item):
            trues.append(item)
        else:
            falses.append(item)
    return trues, falses

is the simplest and the fastest.

Benchmarking the different approaches

The different approaches that had been suggested can be classified broadly in three categories,

  1. straightforward list manipulation via lis.append, returning a 2-tuple of lists,
  2. lis.append mediated by a functional approach, returning a 2-tuple of lists,
  3. using the canonical recipe given in the itertools fine documentation, returning a 2-tuple of, loosely speaking, generators.

Here follows a vanilla implementation of the three techniques, first the functional approach, then itertools and eventually two different implementations of direct list manipulation, the alternative being using the False is zero, True is one trick.

Note that this is Python3 — hence reduce comes from functools — and that OP request a tuple like (positives, negatives) but my implementations all return (negatives, positives)

$ ipython
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import functools
   ...: 
   ...: def partition_fu(p, l, r=functools.reduce):
   ...:     return r(lambda x, y: x[p(y)].append(y) or x, l, ([], []))
   ...: 

In [2]: import itertools
   ...: 
   ...: def partition_it(pred, iterable,
   ...:               filterfalse=itertools.filterfalse,
   ...:               tee=itertools.tee):
   ...:     t1, t2 = tee(iterable)
   ...:     return filterfalse(pred, t1), filter(pred, t2)
   ...: 

In [3]: def partition_li(p, l):
   ...:     a, b = [], []
   ...:     for n in l:
   ...:         if p(n):
   ...:             b.append(n)
   ...:         else:
   ...:             a.append(n)
   ...:     return a, b
   ...: 

In [4]: def partition_li_alt(p, l):
   ...:     x = [], []
   ...:     for n in l: x[p(n)].append(n)
   ...:     return x
   ...: 

We need a predicate to apply to our lists and lists (again, loosely speaking) on which to operate.

In [5]: p = lambda n:n%2

In [6]: five, ten = range(50000), range(100000)

To overcome the problem in testing the itertools approach, that was reported by joeln on Oct 31 '13 at 6:17

Nonsense. You've calculated the time taken to construct the generators in filterfalse and filter, but you've not iterated through the input or called pred once! The advantage of the itertools recipe is that it does not materialise any list, or look further ahead in the input than necessary. It calls pred twice as often and takes almost twice as long as Byers et al.

I have thought of a void loop that just instantiates all the couples of elements in the two iterables returned by the different partition functions.

First we use two fixed lists to have an idea of the overload implied (using the very convenient IPython's magic %timeit)

In [7]: %timeit for e, o in zip(five, five): pass
4.21 ms ± 39.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Next we use the different implementations, one after the other

In [8]: %timeit for e, o in zip(*partition_fu(p, ten)): pass
53.9 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit for e, o in zip(*partition_it(p, ten)): pass
44.5 ms ± 3.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [10]: %timeit for e, o in zip(*partition_li(p, ten)): pass
36.3 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [11]: %timeit for e, o in zip(*partition_li_alt(p, ten)): pass
37.3 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [12]:

Comments

The plainest of the approaches is also the fastest one.

Using the x[p(n)] trick is, ehm, useless because at every step you have to index a data structure, giving you a slight penalty — it's however nice to know if you want to persuade a survivor of a declining culture at pythonizing.

The functional approach, that is operatively equivalent to the alternative append implementation, is ~50% slower, possibly due to the fact that we have an extra (w/r to predicate evaluation) function call for each list element.

The itertools approach has the (customary) advantages that ❶ no potentially large list is instantiated and ❷ the input list is not entirely processed if you break out of the consumer loop, but when we use it it is slower because of the need to apply the predicate on both ends of the tee

Aside

I've fallen in love with the object.mutate() or object idiom that was exposed by Marii in their answer showing a functional approach to the problem — I'm afraid that, sooner or later, I'm going to abuse it.

Footnotes

[1] Accepted and most voted as today, Sep 14 2017 — but of course I have the highest hopes for this answer of mine!




回答5:


If you don't have duplicate element in your list you can definitely use set:

>>> a = [1,4,12,7,42]
>>> b = filter(lambda x: x > 10, [1,4,12,7,42])
>>> no_b = set(a) - set(b)
set([1, 4, 7])

or you can do by a list comprehensible:

>>> no_b = [i for i in a if i not in b]

N.B: it's not a function but just knowing the first fitler() result you can deduce the element that didn't much your filter criterion .




回答6:


You can look at django.utils.functional.partition solution:

def partition(predicate, values):
    """
    Splits the values into two sets, based on the return value of the function
    (True/False). e.g.:

        >>> partition(lambda x: x > 3, range(5))
        [0, 1, 2, 3], [4]
    """
    results = ([], [])
    for item in values:
        results[predicate(item)].append(item)
    return results

In my opinion it's the most elegant solution presented here.

This part is not documented, only source code can be found on https://docs.djangoproject.com/en/dev/_modules/django/utils/functional/




回答7:


from itertools import ifilterfalse

def filter2(predicate, iterable):
    return filter(predicate, iterable), list(ifilterfalse(predicate, iterable))



回答8:


I just had exactly this requirement. I'm not keen on the itertools recipe since it involves two separate passes through the data. Here's my implementation:

def filter_twoway(test, data):
    "Like filter(), but returns the passes AND the fails as two separate lists"
    collected = {True: [], False: []}
    for datum in data:
        collected[test(datum)].append(datum)
    return (collected[True], collected[False])



回答9:


Everyone seems to think that their solution is the best, so I decided to use timeit to test all of them. I used "def is_odd(x): return x & 1" as my predicate function, and "xrange(1000)" as the iterable. Here is my version of Python:

Python 2.7.3 (v2.7.3:70274d53c1dd, Apr  9 2012, 20:52:43) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

And here are the results of my testing:

Mark Byers
1000 loops, best of 3: 325 usec per loop

cldy
1000 loops, best of 3: 1.96 msec per loop

Dan S
1000 loops, best of 3: 412 usec per loop

TTimo
1000 loops, best of 3: 503 usec per loop

Those are all comparable to each other. Now, let's try using the example given in the Python documentation.

import itertools

def partition(pred, iterable,
              # Optimized by replacing global lookups with local variables
              # defined as default values.
              filter=itertools.ifilter,
              filterfalse=itertools.ifilterfalse,
              tee=itertools.tee):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

This seems to be a bit faster.

100000 loops, best of 3: 2.58 usec per loop

The itertools example code beats all comers by a factor of at least 100! The moral is, don't keep re-inventing the wheel.




回答10:


Plenty of good answers already. I like to use this:

def partition( pred, iterable ):
    def _dispatch( ret, v ):
        if ( pred( v ) ):
            ret[0].append( v )
        else:
            ret[1].append( v )
        return ret
    return reduce( _dispatch, iterable, ( [], [] ) )

if ( __name__ == '__main__' ):
    import random
    seq = range( 20 )
    random.shuffle( seq )
    print( seq )
    print( partition( lambda v : v > 10, seq ) )



回答11:


Concise code for appending to target list

    def partition(cond,inputList):
        a,b= [],[]
        for item in inputList:
            target = a if cond(item) else b
            target.append(item)
        return a, b


    >>> a, b= partition(lambda x: x > 10,[1,4,12,7,42])
    >>> a
    [12, 42]
    >>> b
    [1, 4, 7]


来源:https://stackoverflow.com/questions/4578590/python-equivalent-of-filter-getting-two-output-lists-i-e-partition-of-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!