问题
This question is very similar to this one Group Python list of lists into groups based on overlapping items, in fact it could be called a duplicate.
Basically, I have a list of sub-lists where each sub-list contains some number of integers (this number is not the same among sub-lists). I need to group all sub-lists that share one integer or more.
The reason I'm asking a new separate question is because I'm attempting to adapt Martijn Pieters' great answer with no luck.
Here's the MWE:
def grouper(sequence):
result = [] # will hold (members, group) tuples
for item in sequence:
for members, group in result:
if members.intersection(item): # overlap
members.update(item)
group.append(item)
break
else: # no group found, add new
result.append((set(item), [item]))
return [group for members, group in result]
gr = [[29, 27, 26, 28], [31, 11, 10, 3, 30], [71, 51, 52, 69],
[78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81],
[86, 68, 67, 84]]
for i, group in enumerate(grouper(gr)):
print 'g{}:'.format(i), group
and the output I get is:
g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [84, 67, 78, 77, 81], [86, 68, 67, 84]]
g4: [[86, 84, 81, 82, 83, 85]]
The last group g4
should have been merged with g3
, since the lists inside them share the items 81
, 83
and 84
, and even a single repeated element should be enough for them to be merged.
I'm not sure if I'm applying the code wrong, or if there's something wrong with the code.
回答1:
You can describe the merge you want to do as a set consolidation or as a connected-components problem. I tend to use an off-the-shelf set consolidation algorithm and then adapt it to the particular situation. For example, IIUC, you could use something like
def consolidate(sets):
# http://rosettacode.org/wiki/Set_consolidation#Python:_Iterative
setlist = [s for s in sets if s]
for i, s1 in enumerate(setlist):
if s1:
for s2 in setlist[i+1:]:
intersection = s1.intersection(s2)
if intersection:
s2.update(s1)
s1.clear()
s1 = s2
return [s for s in setlist if s]
def wrapper(seqs):
consolidated = consolidate(map(set, seqs))
groupmap = {x: i for i,seq in enumerate(consolidated) for x in seq}
output = {}
for seq in seqs:
target = output.setdefault(groupmap[seq[0]], [])
target.append(seq)
return list(output.values())
which gives
>>> for i, group in enumerate(wrapper(gr)):
... print('g{}:'.format(i), group)
...
g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81], [86, 68, 67, 84]]
(Order not guaranteed because of the use of the dictionaries.)
回答2:
Sounds like set consolidation if you turn each sub list into a set instead as you are interested in the contents not the order so sets are the best data-structure choice. See this: http://rosettacode.org/wiki/Set_consolidation
来源:https://stackoverflow.com/questions/32057777/group-python-lists-based-on-repeated-items