Disturbing odd behavior/bug in Python itertools groupby?

前端 未结 3 1652
感情败类
感情败类 2021-01-23 07:38

I am using itertools.groupby to parse a short tab-delimited textfile. the text file has several columns and all I want to do is group all the entries that have a pa

相关标签:
3条回答
  • 2021-01-23 08:17

    I don't know what your data looks like but my guess is it's not sorted. groupby works on sorted data

    0 讨论(0)
  • 2021-01-23 08:33

    You're going to want to change your code to force the data to be in key order...

    data = csv.DictReader(open(f), delimiter="\t", fieldnames=fieldnames)
    sorted_data = sorted(data, key=operator.itemgetter(col_name))
    for name, entries in itertools.groupby(data, key=operator.itemgetter(col_name)):
        pass # whatever
    

    The main use though, is when the datasets are large, and the data is already in key order, so when you have to sort anyway, then using a defaultdict is more efficient

    from collections import defaultdict
    name_entries = defaultdict(list)
    for row in data:
        name_entries[row[col_name]].append(row)
    
    0 讨论(0)
  • 2021-01-23 08:34

    According to the documentation, groupby() groups only consecutive occurrences of the same key.

    0 讨论(0)
提交回复
热议问题