问题
(In relation to this question I posed a few days ago)
I have a dictionary whose keys are strings, and whose values are sets of integers, for example:
db = {"a":{1,2,3}, "b":{5,6,7}, "c":{2,5,4}, "d":{8,11,10,18}, "e":{0,3,2}}
I would like to have a procedure that joins the keys whose values satisfy a certain generic condition given in an external function. The new item will therefore have as a key the union of both keys (the order is not important). The value will be determined by the condition itserf.
For example: given this condition function:
def condition(kv1: tuple, kv2: tuple):
key1, val1 = kv1
key2, val2 = kv2
union = val1 | val2 #just needed for the following line
maxDif = max(union) - min(union)
newVal = set()
for i in range(maxDif):
auxVal1 = {pos - i for pos in val2}
auxVal2 = {pos + i for pos in val2}
intersection1 = val1.intersection(auxVal1)
intersection2 = val1.intersection(auxVal2)
print(intersection1, intersection2)
if (len(intersection1) >= 3):
newVal.update(intersection1)
if (len(intersection2) >= 3):
newVal.update({pos - i for pos in intersection2})
if len(newVal)==0:
return False
else:
newKey = "".join(sorted(key1+key2))
return newKey, newVal
That is, the satisfying pair of items have at least 3 numbers in their values at the same distance (difference) between them. As said, if satisfied, the resulting key is the union of the two keys. And for this particular example, the value is the (minimum) matching numbers in the original value sets.
How can I smartly apply a function like this to a dictionary like db
? Given the aforementioned dictionary, the expected result would be:
result = {"ab":{1,2,3}, "cde":{0,3,2}, "d":{18}}
回答1:
Your "condition" in this case is more than just a mere condition. It is actually merging rule that identifies values to keep and values to drop. This may or may not allow a generalized approach depending on how the patterns and merge rules vary.
Given this, each merge operation could leave values in the original keys that may be merged with some of the remaining keys. Multiple merges can also occur (e.g. key "cde"). In theory the merging process would need to cover a power set of all keys which may be impractical. Alternatively, this can be performed by successive refinements using pairings of (original and/or merged) keys.
The merge condition/function:
db = {"a":{1,2,3}, "b":{5,6,7}, "c":{2,5,4}, "d":{8,11,10,18}, "e":{0,3,2}}
from itertools import product
from collections import Counter
# Apply condition and return a keep-set and a remove-set
# the keep-set will be empty if the matching condition is not met
def merge(A,B,inverted=False):
minMatch = 3
distances = Counter(b-a for a,b in product(A,B) if b>=a)
delta = [d for d,count in distances.items() if count>=minMatch]
keep = {a for a in A if any(a+d in B for d in delta)}
remove = {b for b in B if any(b-d in A for d in delta)}
if len(keep)>=minMatch: return keep,remove
return None,None
print( merge(db["a"],db["b"]) ) # ({1, 2, 3}, {5, 6, 7})
print( merge(db["e"],db["d"]) ) # ({0, 2, 3}, {8, 10, 11})
Merge Process:
# combine dictionary keys using a merging function/condition
def combine(D,mergeFunction):
result = { k:set(v) for k,v in D.items() } # start with copy of input
merging = True
while merging: # keep merging until no more merges are performed
merging = False
for a,b in product(*2*[list(result.keys())]): # all key pairs
if a==b: continue
if a not in result or b not in result: continue # keys still there?
m,n = mergeFunction(result[a],result[b]) # call merge function
if not m : continue # if merged ...
mergedKey = "".join(sorted(set(a+b))) # combine keys
result[mergedKey] = m # add merged set
if mergedKey != a: result[a] -= m; merging = True # clean/clear
if not result[a]: del result[a] # original sets,
if mergedKey != b: result[b] -= n; merging = True # do more merges
if not result[b]: del result[b]
return result
来源:https://stackoverflow.com/questions/65716614/rearranging-a-dictionary-based-on-a-function-condition-over-its-items