I haven\'t been able to find an understandable explanation of how to actually use Python\'s itertools.groupby()
function. What I\'m trying to do is this:
I would like to give another example where groupby without sort is not working. Adapted from example by James Sulak
from itertools import groupby
things = [("vehicle", "bear"), ("animal", "duck"), ("animal", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):
for thing in group:
print "A %s is a %s." % (thing[1], key)
print " "
output is
A bear is a vehicle.
A duck is a animal.
A cactus is a animal.
A speed boat is a vehicle.
A school bus is a vehicle.
there are two groups with vehicule, whereas one could expect only one group
@CaptSolo, I tried your example, but it didn't work.
from itertools import groupby
[(c,len(list(cs))) for c,cs in groupby('Pedro Manoel')]
Output:
[('P', 1), ('e', 1), ('d', 1), ('r', 1), ('o', 1), (' ', 1), ('M', 1), ('a', 1), ('n', 1), ('o', 1), ('e', 1), ('l', 1)]
As you can see, there are two o's and two e's, but they got into separate groups. That's when I realized you need to sort the list passed to the groupby function. So, the correct usage would be:
name = list('Pedro Manoel')
name.sort()
[(c,len(list(cs))) for c,cs in groupby(name)]
Output:
[(' ', 1), ('M', 1), ('P', 1), ('a', 1), ('d', 1), ('e', 2), ('l', 1), ('n', 1), ('o', 2), ('r', 1)]
Just remembering, if the list is not sorted, the groupby function will not work!
This basic implementation helped me understand this function. Hope it helps others as well:
arr = [(1, "A"), (1, "B"), (1, "C"), (2, "D"), (2, "E"), (3, "F")]
for k,g in groupby(arr, lambda x: x[0]):
print("--", k, "--")
for tup in g:
print(tup[1]) # tup[0] == k
-- 1 --
A
B
C
-- 2 --
D
E
-- 3 --
F
WARNING:
The syntax list(groupby(...)) won't work the way that you intend. It seems to destroy the internal iterator objects, so using
for x in list(groupby(range(10))):
print(list(x[1]))
will produce:
[]
[]
[]
[]
[]
[]
[]
[]
[]
[9]
Instead, of list(groupby(...)), try [(k, list(g)) for k,g in groupby(...)], or if you use that syntax often,
def groupbylist(*args, **kwargs):
return [(k, list(g)) for k, g in groupby(*args, **kwargs)]
and get access to the groupby functionality while avoiding those pesky (for small data) iterators all together.
Another example:
for key, igroup in itertools.groupby(xrange(12), lambda x: x // 5):
print key, list(igroup)
results in
0 [0, 1, 2, 3, 4]
1 [5, 6, 7, 8, 9]
2 [10, 11]
Note that igroup is an iterator (a sub-iterator as the documentation calls it).
This is useful for chunking a generator:
def chunker(items, chunk_size):
'''Group items in chunks of chunk_size'''
for _key, group in itertools.groupby(enumerate(items), lambda x: x[0] // chunk_size):
yield (g[1] for g in group)
with open('file.txt') as fobj:
for chunk in chunker(fobj):
process(chunk)
Another example of groupby - when the keys are not sorted. In the following example, items in xx are grouped by values in yy. In this case, one set of zeros is output first, followed by a set of ones, followed again by a set of zeros.
xx = range(10)
yy = [0, 0, 0, 1, 1, 1, 0, 0, 0, 0]
for group in itertools.groupby(iter(xx), lambda x: yy[x]):
print group[0], list(group[1])
Produces:
0 [0, 1, 2]
1 [3, 4, 5]
0 [6, 7, 8, 9]
One useful example that I came across may be helpful:
from itertools import groupby
#user input
myinput = input()
#creating empty list to store output
myoutput = []
for k,g in groupby(myinput):
myoutput.append((len(list(g)),int(k)))
print(*myoutput)
Sample input: 14445221
Sample output: (1,1) (3,4) (1,5) (2,2) (1,1)