I am using itertools.groupby
to parse a short tab-delimited textfile. the text file has several columns and all I want to do is group all the entries that have a pa
I don't know what your data looks like but my guess is it's not sorted. groupby works on sorted data
You're going to want to change your code to force the data to be in key order...
data = csv.DictReader(open(f), delimiter="\t", fieldnames=fieldnames)
sorted_data = sorted(data, key=operator.itemgetter(col_name))
for name, entries in itertools.groupby(data, key=operator.itemgetter(col_name)):
pass # whatever
The main use though, is when the datasets are large, and the data is already in key order, so when you have to sort anyway, then using a defaultdict
is more efficient
from collections import defaultdict
name_entries = defaultdict(list)
for row in data:
name_entries[row[col_name]].append(row)
According to the documentation, groupby()
groups only consecutive occurrences of the same key.