How do I use itertools.groupby()?

前端 未结 13 1750
失恋的感觉
失恋的感觉 2020-11-22 02:14

I haven\'t been able to find an understandable explanation of how to actually use Python\'s itertools.groupby() function. What I\'m trying to do is this:

<
相关标签:
13条回答
  • 2020-11-22 02:35

    IMPORTANT NOTE: You have to sort your data first.


    The part I didn't get is that in the example construction

    groups = []
    uniquekeys = []
    for k, g in groupby(data, keyfunc):
       groups.append(list(g))    # Store group iterator as a list
       uniquekeys.append(k)
    

    k is the current grouping key, and g is an iterator that you can use to iterate over the group defined by that grouping key. In other words, the groupby iterator itself returns iterators.

    Here's an example of that, using clearer variable names:

    from itertools import groupby
    
    things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
    
    for key, group in groupby(things, lambda x: x[0]):
        for thing in group:
            print("A %s is a %s." % (thing[1], key))
        print("")
        
    

    This will give you the output:

    A bear is a animal.
    A duck is a animal.

    A cactus is a plant.

    A speed boat is a vehicle.
    A school bus is a vehicle.

    In this example, things is a list of tuples where the first item in each tuple is the group the second item belongs to.

    The groupby() function takes two arguments: (1) the data to group and (2) the function to group it with.

    Here, lambda x: x[0] tells groupby() to use the first item in each tuple as the grouping key.

    In the above for statement, groupby returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.

    Here's a slightly different example with the same data, using a list comprehension:

    for key, group in groupby(things, lambda x: x[0]):
        listOfThings = " and ".join([thing[1] for thing in group])
        print(key + "s:  " + listOfThings + ".")
    

    This will give you the output:

    animals: bear and duck.
    plants: cactus.
    vehicles: speed boat and school bus.

    0 讨论(0)
  • 2020-11-22 02:35

    You can write own groupby function:

               def groupby(data):
                    kv = {}
                    for k,v in data:
                        if k not in kv:
                             kv[k]=[v]
                        else:
                            kv[k].append(v)
               return kv
    
         Run on ipython:
           In [10]: data = [('a', 1), ('b',2),('a',2)]
    
            In [11]: groupby(data)
            Out[11]: {'a': [1, 2], 'b': [2]}
    
    0 讨论(0)
  • 2020-11-22 02:37

    itertools.groupby is a tool for grouping items.

    From the docs, we glean further what it might do:

    # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B

    # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D

    groupby objects yield key-group pairs where the group is a generator.

    Features

    • A. Group consecutive items together
    • B. Group all occurrences of an item, given a sorted iterable
    • C. Specify how to group items with a key function *

    Comparisons

    # Define a printer for comparing outputs
    >>> def print_groupby(iterable, keyfunc=None):
    ...    for k, g in it.groupby(iterable, keyfunc):
    ...        print("key: '{}'--> group: {}".format(k, list(g)))
    
    # Feature A: group consecutive occurrences
    >>> print_groupby("BCAACACAADBBB")
    key: 'B'--> group: ['B']
    key: 'C'--> group: ['C']
    key: 'A'--> group: ['A', 'A']
    key: 'C'--> group: ['C']
    key: 'A'--> group: ['A']
    key: 'C'--> group: ['C']
    key: 'A'--> group: ['A', 'A']
    key: 'D'--> group: ['D']
    key: 'B'--> group: ['B', 'B', 'B']
    
    # Feature B: group all occurrences
    >>> print_groupby(sorted("BCAACACAADBBB"))
    key: 'A'--> group: ['A', 'A', 'A', 'A', 'A']
    key: 'B'--> group: ['B', 'B', 'B', 'B']
    key: 'C'--> group: ['C', 'C', 'C']
    key: 'D'--> group: ['D']
    
    # Feature C: group by a key function
    >>> # islower = lambda s: s.islower()                      # equivalent
    >>> def islower(s):
    ...     """Return True if a string is lowercase, else False."""   
    ...     return s.islower()
    >>> print_groupby(sorted("bCAaCacAADBbB"), keyfunc=islower)
    key: 'False'--> group: ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D']
    key: 'True'--> group: ['a', 'a', 'b', 'b', 'c']
    

    Uses

    • Anagrams (see notebook)
    • Binning
    • Group odd and even numbers
    • Group a list by values
    • Remove duplicate elements
    • Find indices of repeated elements in an array
    • Split an array into n-sized chunks
    • Find corresponding elements between two lists
    • Compression algorithm (see notebook)/Run Length Encoding
    • Grouping letters by length, key function (see notebook)
    • Consecutive values over a threshold (see notebook)
    • Find ranges of numbers in a list or continuous items (see docs)
    • Find all related longest sequences
    • Take consecutive sequences that meet a condition (see related post)

    Note: Several of the latter examples derive from Víctor Terrón's PyCon (talk) (Spanish), "Kung Fu at Dawn with Itertools". See also the groupby source code written in C.

    * A function where all items are passed through and compared, influencing the result. Other objects with key functions include sorted(), max() and min().


    Response

    # OP: Yes, you can use `groupby`, e.g. 
    [do_something(list(g)) for _, g in groupby(lxml_elements, criteria_func)]
    
    0 讨论(0)
  • 2020-11-22 02:37

    How do I use Python's itertools.groupby()?

    You can use groupby to group things to iterate over. You give groupby an iterable, and a optional key function/callable by which to check the items as they come out of the iterable, and it returns an iterator that gives a two-tuple of the result of the key callable and the actual items in another iterable. From the help:

    groupby(iterable[, keyfunc]) -> create an iterator which returns
    (key, sub-iterator) grouped by each value of key(value).
    

    Here's an example of groupby using a coroutine to group by a count, it uses a key callable (in this case, coroutine.send) to just spit out the count for however many iterations and a grouped sub-iterator of elements:

    import itertools
    
    
    def grouper(iterable, n):
        def coroutine(n):
            yield # queue up coroutine
            for i in itertools.count():
                for j in range(n):
                    yield i
        groups = coroutine(n)
        next(groups) # queue up coroutine
    
        for c, objs in itertools.groupby(iterable, groups.send):
            yield c, list(objs)
        # or instead of materializing a list of objs, just:
        # return itertools.groupby(iterable, groups.send)
    
    list(grouper(range(10), 3))
    

    prints

    [(0, [0, 1, 2]), (1, [3, 4, 5]), (2, [6, 7, 8]), (3, [9])]
    
    0 讨论(0)
  • 2020-11-22 02:39

    Sorting and groupby

    from itertools import groupby
    
    val = [{'name': 'satyajit', 'address': 'btm', 'pin': 560076}, 
           {'name': 'Mukul', 'address': 'Silk board', 'pin': 560078},
           {'name': 'Preetam', 'address': 'btm', 'pin': 560076}]
    
    
    for pin, list_data in groupby(sorted(val, key=lambda k: k['pin']),lambda x: x['pin']):
    ...     print pin
    ...     for rec in list_data:
    ...             print rec
    ... 
    o/p:
    
    560076
    {'name': 'satyajit', 'pin': 560076, 'address': 'btm'}
    {'name': 'Preetam', 'pin': 560076, 'address': 'btm'}
    560078
    {'name': 'Mukul', 'pin': 560078, 'address': 'Silk board'}
    
    0 讨论(0)
  • 2020-11-22 02:40

    The example on the Python docs is quite straightforward:

    groups = []
    uniquekeys = []
    for k, g in groupby(data, keyfunc):
        groups.append(list(g))      # Store group iterator as a list
        uniquekeys.append(k)
    

    So in your case, data is a list of nodes, keyfunc is where the logic of your criteria function goes and then groupby() groups the data.

    You must be careful to sort the data by the criteria before you call groupby or it won't work. groupby method actually just iterates through a list and whenever the key changes it creates a new group.

    0 讨论(0)
提交回复
热议问题