I often want to bucket an unordered collection in python. itertools.groubpy does the right sort of thing but almost always requires massaging to sort the items first and cat
Here is a simple two liner
d = {}
for x in "thequickbrownfoxjumpsoverthelazydog": d.setdefault(x in 'aeiou', []).append(x)
Edit:
Just adding your other case for completeness.
d={}
for x in xrange(21): d.setdefault(x%10, []).append(x)
This has come up several times before -- (1), (2), (3) -- and there's a partition recipe in the itertools recipes, but to my knowledge there's nothing in the standard library.. although I was surprised a few weeks ago by accumulate
, so who knows what's lurking there these days? :^)
When I need this behaviour, I use
from collections import defaultdict
def partition(seq, key):
d = defaultdict(list)
for x in seq:
d[key(x)].append(x)
return d
and get on with my day.
Edit:
Using DSM's answer as a start, here is a slightly more concise, general answer:
d = defaultdict(list)
map(lambda x: d[x in 'aeiou'].append(x),'thequickbrownfoxjumpsoverthelazydog')
or
d = defaultdict(list)
map(lambda x: d[x %10].append(x),xrange(21))
#
Here is a two liner:
d = {False:[],True:[]}
filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")
Which can of course be made a one-liner:
d = {False:[],True:[]};filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")
If its a pandas.DataFrame
the following also works, utilizing pd.cut()
from sklearn import datasets
import pandas as pd
# import some data to play with
iris = datasets.load_iris()
df_data = pd.DataFrame(iris.data[:,0]) # we'll just take the first feature
# bucketize
n_bins = 5
feature_name = iris.feature_names[0].replace(" ", "_")
my_labels = [str(feature_name) + "_" + str(num) for num in range(0,n_bins)]
pd.cut(df_data[0], bins=n_bins, labels=my_labels)
yielding
0 0_1
1 0_0
2 0_0
[...]
In case you don't set the labels
, the output is going to like this
0 (5.02, 5.74]
1 (4.296, 5.02]
2 (4.296, 5.02]
[...]
Here's a variant of partition()
from above when the predicate is boolean, avoiding the cost of a dict
/defaultdict
:
def boolpartition(seq, pred):
passing, failing = [], []
for item in seq:
(passing if pred(item) else failing).append(item)
return passing, failing
Example usage:
>>> even, odd = boolpartition([1, 2, 3, 4, 5], lambda x: x % 2 == 0)
>>> even
[2, 4]
>>> odd
[1, 3, 5]