Python: split a list based on a condition?

前端未结

关注

 30  1870

What\'s the best way, both aesthetically and from a performance perspective, to split a list of items into multiple lists based on a conditional? The equivalent of:

相关标签:

30条回答

轮回少年

2020-11-22 07:15
For example, splitting list by even and odd
```
arr = range(20)
even, odd = reduce(lambda res, next: res[next % 2].append(next) or res, arr, ([], []))
```
Or in general:
```
def split(predicate, iterable):
    return reduce(lambda res, e: res[predicate(e)].append(e) or res, iterable, ([], []))
```
Advantages:
- Shortest posible way
- Predicate applies only once for each element
Disadvantages
- Requires knowledge of functional programing paradigm
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2020-11-22 07:17
I basically like Anders' approach as it is very general. Here's a version that puts the categorizer first (to match filter syntax) and uses a defaultdict (assumed imported).
```
def categorize(func, seq):
    """Return mapping from categories to lists
    of categorized items.
    """
    d = defaultdict(list)
    for item in seq:
        d[func(item)].append(item)
    return d
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-11-22 07:20
Problem with all proposed solutions is that it will scan and apply the filtering function twice. I'd make a simple small function like this:
```
def split_into_two_lists(lst, f):
    a = []
    b = []
    for elem in lst:
        if f(elem):
            a.append(elem)
        else:
            b.append(elem)
    return a, b
```
That way you are not processing anything twice and also are not repeating code.
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-11-22 07:21
```
good = [x for x in mylist if x in goodvals]
bad  = [x for x in mylist if x not in goodvals]
```
is there a more elegant way to do this?
That code is perfectly readable, and extremely clear!
```
# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f[2].lower() not in IMAGE_TYPES]
```
Again, this is fine!

There might be slight performance improvements using sets, but it's a trivial difference, and I find the list comprehension far easier to read, and you don't have to worry about the order being messed up, duplicates being removed as so on.

In fact, I may go another step "backward", and just use a simple for loop:
```
images, anims = [], []

for f in files:
    if f.lower() in IMAGE_TYPES:
        images.append(f)
    else:
        anims.append(f)
```
The a list-comprehension or using set() is fine until you need to add some other check or another bit of logic - say you want to remove all 0-byte jpeg's, you just add something like..
```
if f[1] == 0:
    continue
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

粉色の甜心

2020-11-22 07:21

solution

from itertools import tee

def unpack_args(fn):
    return lambda t: fn(*t)

def separate(fn, lx):
    return map(
        unpack_args(
            lambda i, ly: filter(
                lambda el: bool(i) == fn(el),
                ly)),
        enumerate(tee(lx, 2)))

test

[even, odd] = separate(
    lambda x: bool(x % 2),
    [1, 2, 3, 4, 5])
print(list(even) == [2, 4])
print(list(odd) == [1, 3, 5])

0 讨论(0)

谎友^

2020-11-22 07:22
My take on it. I propose a lazy, single-pass, partition function, which preserves relative order in the output subsequences.

1. Requirements

I assume that the requirements are:
- maintain elements' relative order (hence, no sets and dictionaries)
- evaluate condition only once for every element (hence not using (i)filter or groupby)
- allow for lazy consumption of either sequence (if we can afford to precompute them, then the naïve implementation is likely to be acceptable too)
2. split library

My partition function (introduced below) and other similar functions have made it into a small library:
- python-split
It's installable normally via PyPI:
```
pip install --user split
```
To split a list base on condition, use partition function:
```
>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]
```
3. partition function explained

Internally we need to build two subsequences at once, so consuming only one output sequence will force the other one to be computed too. And we need to keep state between user requests (store processed but not yet requested elements). To keep state, I use two double-ended queues (deques):
```
from collections import deque
```
SplitSeq class takes care of the housekeeping:
```
class SplitSeq:
    def __init__(self, condition, sequence):
        self.cond = condition
        self.goods = deque([])
        self.bads = deque([])
        self.seq = iter(sequence)
```
Magic happens in its .getNext() method. It is almost like .next() of the iterators, but allows to specify which kind of element we want this time. Behind the scene it doesn't discard the rejected elements, but instead puts them in one of the two queues:
```
    def getNext(self, getGood=True):
        if getGood:
            these, those, cond = self.goods, self.bads, self.cond
        else:
            these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
        if these:
            return these.popleft()
        else:
            while 1: # exit on StopIteration
                n = self.seq.next()
                if cond(n):
                    return n
                else:
                    those.append(n)
```
The end user is supposed to use partition function. It takes a condition function and a sequence (just like map or filter), and returns two generators. The first generator builds a subsequence of elements for which the condition holds, the second one builds the complementary subsequence. Iterators and generators allow for lazy splitting of even long or infinite sequences.
```
def partition(condition, sequence):
    cond = condition if condition else bool  # evaluate as bool if condition == None
    ss = SplitSeq(cond, sequence)
    def goods():
        while 1:
            yield ss.getNext(getGood=True)
    def bads():
        while 1:
            yield ss.getNext(getGood=False)
    return goods(), bads()
```
I chose the test function to be the first argument to facilitate partial application in the future (similar to how map and filter have the test function as the first argument).
0 讨论(0)
发布评论:

提交评论
- 加载中...

Python: split a list based on a condition?

solution

test

1. Requirements

2. `split` library

3. `partition` function explained

Python: split a list based on a condition?

solution

test

1. Requirements

2. split library

3. partition function explained

2. `split` library

3. `partition` function explained