I have a bunch of lines in text with names and teams in this format:
Team (year)|Surname1, Name1
e.g.
Yankees (1993)|Abbot, Jim
Yankees (1994)|Abbot, Jim
Yankees (1993)|Assenmacher, Paul
Yankees (2000)|Buddies, Mike
Yankees (2000)|Canseco, Jose
and so on for several years and several teams. I would like to aggregate names of players according to team (year) combination deleting any duplicated names (it may happen that in the original database there is some redundant information). In the example, my output should be:
Yankees (1993)|Abbot, Jim|Assenmacher, Paul
Yankees (1994)|Abbot, Jim
Yankees (2000)|Buddies, Mike|Canseco, Jose
I've written this code so far:
file_in = open('filein.txt')
file_out = open('fileout.txt', 'w+')
from collections import defaultdict
teams = defaultdict(set)
for line in file_in:
items = [entry.strip() for entry in line.split('|') if entry]
team = items[0]
name = items[1]
teams[team].add(name)
I end up with a big dictionary made up by keys (the name of the team and the year) and sets of values. But I don't know exactly how to go on to aggregate things up.
I would also be able to compare my final sets of values (e.g. how many players have Yankee's team of 1993 and 1994 in common?). How can I do this?
Any help is appreciated
You can use a tuple as a key here, for eg. ('Yankees', '1994')
:
from collections import defaultdict
dic = defaultdict(list)
with open('abc') as f:
for line in f:
key,val = line.split('|')
keys = tuple(x.strip('()') for x in key.split())
vals = [x.strip() for x in val.split(', ')]
dic[keys].append(vals)
print dic
for k,v in dic.iteritems():
print "{}({})|{}".format(k[0],k[1],"|".join([", ".join(x) for x in v]))
Output:
defaultdict(<type 'list'>,
{('Yankees', '1994'): [['Abbot', 'Jim']],
('Yankees', '2000'): [['Buddies', 'Mike'], ['Canseco', 'Jose']],
('Yankees', '1993'): [['Abbot', 'Jim'], ['Assenmacher', 'Paul']]})
Yankees(1994)|Abbot, Jim
Yankees(2000)|Buddies, Mike|Canseco, Jose
Yankees(1993)|Abbot, Jim|Assenmacher, Paul
来源:https://stackoverflow.com/questions/17405541/aggregate-sets-according-to-keys-with-defaultdict-python