Simple way to group items into buckets

后端 未结 5 803
耶瑟儿~
耶瑟儿~ 2020-12-29 08:48

I often want to bucket an unordered collection in python. itertools.groubpy does the right sort of thing but almost always requires massaging to sort the items first and cat

相关标签:
5条回答
  • 2020-12-29 09:32

    Here is a simple two liner

    d = {}
    for x in "thequickbrownfoxjumpsoverthelazydog": d.setdefault(x in 'aeiou', []).append(x)
    

    Edit:

    Just adding your other case for completeness.

    d={}
    for x in xrange(21): d.setdefault(x%10, []).append(x)
    
    0 讨论(0)
  • 2020-12-29 09:33

    This has come up several times before -- (1), (2), (3) -- and there's a partition recipe in the itertools recipes, but to my knowledge there's nothing in the standard library.. although I was surprised a few weeks ago by accumulate, so who knows what's lurking there these days? :^)

    When I need this behaviour, I use

    from collections import defaultdict
    
    def partition(seq, key):
        d = defaultdict(list)
        for x in seq:
            d[key(x)].append(x)
        return d
    

    and get on with my day.

    0 讨论(0)
  • 2020-12-29 09:34

    Edit:

    Using DSM's answer as a start, here is a slightly more concise, general answer:

    d = defaultdict(list)
    map(lambda x: d[x in 'aeiou'].append(x),'thequickbrownfoxjumpsoverthelazydog')
    

    or

    d = defaultdict(list)
    map(lambda x: d[x %10].append(x),xrange(21))
    
    #

    Here is a two liner:

    d = {False:[],True:[]}
    filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")
    

    Which can of course be made a one-liner:

    d = {False:[],True:[]};filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")
    
    0 讨论(0)
  • 2020-12-29 09:38

    If its a pandas.DataFrame the following also works, utilizing pd.cut()

    from sklearn import datasets
    import pandas as pd
    
    # import some data to play with
    iris = datasets.load_iris()
    df_data = pd.DataFrame(iris.data[:,0])  # we'll just take the first feature
    
    # bucketize
    n_bins = 5
    feature_name = iris.feature_names[0].replace(" ", "_")
    my_labels = [str(feature_name) + "_" + str(num) for num in range(0,n_bins)]
    pd.cut(df_data[0], bins=n_bins, labels=my_labels)
    

    yielding

    0      0_1
    1      0_0
    2      0_0
    [...]
    

    In case you don't set the labels, the output is going to like this

    0       (5.02, 5.74]
    1      (4.296, 5.02]
    2      (4.296, 5.02]
    [...]
    
    0 讨论(0)
  • 2020-12-29 09:40

    Here's a variant of partition() from above when the predicate is boolean, avoiding the cost of a dict/defaultdict:

    def boolpartition(seq, pred):
        passing, failing = [], []
        for item in seq:
            (passing if pred(item) else failing).append(item)
        return passing, failing
    

    Example usage:

    >>> even, odd = boolpartition([1, 2, 3, 4, 5], lambda x: x % 2 == 0)
    >>> even
    [2, 4]
    >>> odd
    [1, 3, 5]
    
    0 讨论(0)
提交回复
热议问题