Aggregate sets according to keys with defaultdict python

ε祈祈猫儿з 提交于 2019-12-02 20:31:39

问题


I have a bunch of lines in text with names and teams in this format:

Team (year)|Surname1, Name1

e.g.

Yankees (1993)|Abbot, Jim
Yankees (1994)|Abbot, Jim
Yankees (1993)|Assenmacher, Paul
Yankees (2000)|Buddies, Mike
Yankees (2000)|Canseco, Jose

and so on for several years and several teams. I would like to aggregate names of players according to team (year) combination deleting any duplicated names (it may happen that in the original database there is some redundant information). In the example, my output should be:

Yankees (1993)|Abbot, Jim|Assenmacher, Paul
Yankees (1994)|Abbot, Jim
Yankees (2000)|Buddies, Mike|Canseco, Jose

I've written this code so far:

file_in = open('filein.txt')
file_out = open('fileout.txt', 'w+')

from collections import defaultdict
teams = defaultdict(set)

for line in file_in:
    items = [entry.strip() for entry in line.split('|') if entry]    
    team = items[0]
    name = items[1]
    teams[team].add(name)

I end up with a big dictionary made up by keys (the name of the team and the year) and sets of values. But I don't know exactly how to go on to aggregate things up.

I would also be able to compare my final sets of values (e.g. how many players have Yankee's team of 1993 and 1994 in common?). How can I do this?

Any help is appreciated


回答1:


You can use a tuple as a key here, for eg. ('Yankees', '1994'):

from collections import defaultdict
dic = defaultdict(list)
with open('abc') as f:
    for line in f:
        key,val  = line.split('|')
        keys = tuple(x.strip('()') for x in key.split())
        vals = [x.strip() for x in val.split(', ')]
        dic[keys].append(vals)
print dic
for k,v in dic.iteritems():
    print "{}({})|{}".format(k[0],k[1],"|".join([", ".join(x) for x in v]))

Output:

defaultdict(<type 'list'>, 
{('Yankees', '1994'): [['Abbot', 'Jim']],
 ('Yankees', '2000'): [['Buddies', 'Mike'], ['Canseco', 'Jose']],
 ('Yankees', '1993'): [['Abbot', 'Jim'], ['Assenmacher', 'Paul']]})

Yankees(1994)|Abbot, Jim
Yankees(2000)|Buddies, Mike|Canseco, Jose
Yankees(1993)|Abbot, Jim|Assenmacher, Paul


来源:https://stackoverflow.com/questions/17405541/aggregate-sets-according-to-keys-with-defaultdict-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!