Simple way to group items into buckets

后端未结

关注

 5  803

I often want to bucket an unordered collection in python. itertools.groubpy does the right sort of thing but almost always requires massaging to sort the items first and cat

相关标签:

5条回答

旧巷少年郎

2020-12-29 09:32

Here is a simple two liner

d = {}
for x in "thequickbrownfoxjumpsoverthelazydog": d.setdefault(x in 'aeiou', []).append(x)

Edit:

Just adding your other case for completeness.

d={}
for x in xrange(21): d.setdefault(x%10, []).append(x)

0 讨论(0)

忘掉有多难

2020-12-29 09:33
This has come up several times before -- (1), (2), (3) -- and there's a partition recipe in the itertools recipes, but to my knowledge there's nothing in the standard library.. although I was surprised a few weeks ago by accumulate, so who knows what's lurking there these days? :^)

When I need this behaviour, I use
```
from collections import defaultdict

def partition(seq, key):
    d = defaultdict(list)
    for x in seq:
        d[key(x)].append(x)
    return d
```
and get on with my day.
0 讨论(0)
发布评论:

提交评论
- 加载中...

攒了一身酷

2020-12-29 09:34

Edit:

Using DSM's answer as a start, here is a slightly more concise, general answer:

d = defaultdict(list)
map(lambda x: d[x in 'aeiou'].append(x),'thequickbrownfoxjumpsoverthelazydog')

d = defaultdict(list)
map(lambda x: d[x %10].append(x),xrange(21))

Here is a two liner:

d = {False:[],True:[]}
filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")

Which can of course be made a one-liner:

d = {False:[],True:[]};filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")

0 讨论(0)

执笔经年

2020-12-29 09:38

If its a pandas.DataFrame the following also works, utilizing pd.cut()

from sklearn import datasets
import pandas as pd

# import some data to play with
iris = datasets.load_iris()
df_data = pd.DataFrame(iris.data[:,0])  # we'll just take the first feature

# bucketize
n_bins = 5
feature_name = iris.feature_names[0].replace(" ", "_")
my_labels = [str(feature_name) + "_" + str(num) for num in range(0,n_bins)]
pd.cut(df_data[0], bins=n_bins, labels=my_labels)

yielding

0      0_1
1      0_0
2      0_0
[...]

In case you don't set the labels, the output is going to like this

0       (5.02, 5.74]
1      (4.296, 5.02]
2      (4.296, 5.02]
[...]

0 讨论(0)

臣服心动

2020-12-29 09:40

Here's a variant of partition() from above when the predicate is boolean, avoiding the cost of a dict/defaultdict:

def boolpartition(seq, pred):
    passing, failing = [], []
    for item in seq:
        (passing if pred(item) else failing).append(item)
    return passing, failing

Example usage:

>>> even, odd = boolpartition([1, 2, 3, 4, 5], lambda x: x % 2 == 0)
>>> even
[2, 4]
>>> odd
[1, 3, 5]

0 讨论(0)