How do I use itertools.groupby()?

前端 未结 13 1749
失恋的感觉
失恋的感觉 2020-11-22 02:14

I haven\'t been able to find an understandable explanation of how to actually use Python\'s itertools.groupby() function. What I\'m trying to do is this:

<
相关标签:
13条回答
  • 2020-11-22 02:19

    I would like to give another example where groupby without sort is not working. Adapted from example by James Sulak

    from itertools import groupby
    
    things = [("vehicle", "bear"), ("animal", "duck"), ("animal", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
    
    for key, group in groupby(things, lambda x: x[0]):
        for thing in group:
            print "A %s is a %s." % (thing[1], key)
        print " "
    

    output is

    A bear is a vehicle.
    
    A duck is a animal.
    A cactus is a animal.
    
    A speed boat is a vehicle.
    A school bus is a vehicle.
    

    there are two groups with vehicule, whereas one could expect only one group

    0 讨论(0)
  • 2020-11-22 02:23

    @CaptSolo, I tried your example, but it didn't work.

    from itertools import groupby 
    [(c,len(list(cs))) for c,cs in groupby('Pedro Manoel')]
    

    Output:

    [('P', 1), ('e', 1), ('d', 1), ('r', 1), ('o', 1), (' ', 1), ('M', 1), ('a', 1), ('n', 1), ('o', 1), ('e', 1), ('l', 1)]
    

    As you can see, there are two o's and two e's, but they got into separate groups. That's when I realized you need to sort the list passed to the groupby function. So, the correct usage would be:

    name = list('Pedro Manoel')
    name.sort()
    [(c,len(list(cs))) for c,cs in groupby(name)]
    

    Output:

    [(' ', 1), ('M', 1), ('P', 1), ('a', 1), ('d', 1), ('e', 2), ('l', 1), ('n', 1), ('o', 2), ('r', 1)]
    

    Just remembering, if the list is not sorted, the groupby function will not work!

    0 讨论(0)
  • 2020-11-22 02:25

    This basic implementation helped me understand this function. Hope it helps others as well:

    arr = [(1, "A"), (1, "B"), (1, "C"), (2, "D"), (2, "E"), (3, "F")]
    
    for k,g in groupby(arr, lambda x: x[0]):
        print("--", k, "--")
        for tup in g:
            print(tup[1])  # tup[0] == k
    
    -- 1 --
    A
    B
    C
    -- 2 --
    D
    E
    -- 3 --
    F
    
    0 讨论(0)
  • 2020-11-22 02:26

    WARNING:

    The syntax list(groupby(...)) won't work the way that you intend. It seems to destroy the internal iterator objects, so using

    for x in list(groupby(range(10))):
        print(list(x[1]))
    

    will produce:

    []
    []
    []
    []
    []
    []
    []
    []
    []
    [9]
    

    Instead, of list(groupby(...)), try [(k, list(g)) for k,g in groupby(...)], or if you use that syntax often,

    def groupbylist(*args, **kwargs):
        return [(k, list(g)) for k, g in groupby(*args, **kwargs)]
    

    and get access to the groupby functionality while avoiding those pesky (for small data) iterators all together.

    0 讨论(0)
  • 2020-11-22 02:29

    Another example:

    for key, igroup in itertools.groupby(xrange(12), lambda x: x // 5):
        print key, list(igroup)
    

    results in

    0 [0, 1, 2, 3, 4]
    1 [5, 6, 7, 8, 9]
    2 [10, 11]
    

    Note that igroup is an iterator (a sub-iterator as the documentation calls it).

    This is useful for chunking a generator:

    def chunker(items, chunk_size):
        '''Group items in chunks of chunk_size'''
        for _key, group in itertools.groupby(enumerate(items), lambda x: x[0] // chunk_size):
            yield (g[1] for g in group)
    
    with open('file.txt') as fobj:
        for chunk in chunker(fobj):
            process(chunk)
    

    Another example of groupby - when the keys are not sorted. In the following example, items in xx are grouped by values in yy. In this case, one set of zeros is output first, followed by a set of ones, followed again by a set of zeros.

    xx = range(10)
    yy = [0, 0, 0, 1, 1, 1, 0, 0, 0, 0]
    for group in itertools.groupby(iter(xx), lambda x: yy[x]):
        print group[0], list(group[1])
    

    Produces:

    0 [0, 1, 2]
    1 [3, 4, 5]
    0 [6, 7, 8, 9]
    
    0 讨论(0)
  • 2020-11-22 02:34

    One useful example that I came across may be helpful:

    from itertools import groupby
    
    #user input
    
    myinput = input()
    
    #creating empty list to store output
    
    myoutput = []
    
    for k,g in groupby(myinput):
    
        myoutput.append((len(list(g)),int(k)))
    
    print(*myoutput)
    

    Sample input: 14445221

    Sample output: (1,1) (3,4) (1,5) (2,2) (1,1)

    0 讨论(0)
提交回复
热议问题