Rearranging a dictionary based on a function-condition over its items

问题

(In relation to this question I posed a few days ago)

I have a dictionary whose keys are strings, and whose values are sets of integers, for example:

db = {"a":{1,2,3}, "b":{5,6,7}, "c":{2,5,4}, "d":{8,11,10,18}, "e":{0,3,2}}

I would like to have a procedure that joins the keys whose values satisfy a certain generic condition given in an external function. The new item will therefore have as a key the union of both keys (the order is not important). The value will be determined by the condition itserf.

For example: given this condition function:

def condition(kv1: tuple, kv2: tuple):
  key1, val1 = kv1
  key2, val2 = kv2

  union = val1 | val2 #just needed for the following line
  maxDif = max(union) - min(union)

  newVal = set()
  for i in range(maxDif):
    auxVal1 = {pos - i for pos in val2}
    auxVal2 = {pos + i for pos in val2}
    intersection1 = val1.intersection(auxVal1)
    intersection2 = val1.intersection(auxVal2)
    print(intersection1, intersection2)
    if (len(intersection1) >= 3):
      newVal.update(intersection1)
    if (len(intersection2) >= 3):
      newVal.update({pos - i for pos in intersection2})

  if len(newVal)==0:
    return False
  else:
    newKey = "".join(sorted(key1+key2))
    return newKey, newVal

That is, the satisfying pair of items have at least 3 numbers in their values at the same distance (difference) between them. As said, if satisfied, the resulting key is the union of the two keys. And for this particular example, the value is the (minimum) matching numbers in the original value sets.

How can I smartly apply a function like this to a dictionary like db? Given the aforementioned dictionary, the expected result would be:

result = {"ab":{1,2,3}, "cde":{0,3,2}, "d":{18}}

回答1:

Your "condition" in this case is more than just a mere condition. It is actually merging rule that identifies values to keep and values to drop. This may or may not allow a generalized approach depending on how the patterns and merge rules vary.

Given this, each merge operation could leave values in the original keys that may be merged with some of the remaining keys. Multiple merges can also occur (e.g. key "cde"). In theory the merging process would need to cover a power set of all keys which may be impractical. Alternatively, this can be performed by successive refinements using pairings of (original and/or merged) keys.

The merge condition/function:

db = {"a":{1,2,3}, "b":{5,6,7}, "c":{2,5,4}, "d":{8,11,10,18}, "e":{0,3,2}}

from itertools import product
from collections import Counter

# Apply condition and return a keep-set and a remove-set
# the keep-set will be empty if the matching condition is not met
def merge(A,B,inverted=False):
    minMatch = 3
    distances = Counter(b-a for a,b in product(A,B) if b>=a)
    delta     = [d for d,count in distances.items() if count>=minMatch]
    keep      = {a for a in A if any(a+d in B for d in delta)}
    remove    = {b for b in B if any(b-d in A for d in delta)}
    if len(keep)>=minMatch: return keep,remove
    return None,None
    
    
print( merge(db["a"],db["b"]) )  # ({1, 2, 3}, {5, 6, 7})
print( merge(db["e"],db["d"]) )  # ({0, 2, 3}, {8, 10, 11})

Merge Process:

# combine dictionary keys using a merging function/condition
def combine(D,mergeFunction):
    result  = { k:set(v) for k,v in D.items() }  # start with copy of input
    merging = True    
    while merging:    # keep merging until no more merges are performed
        merging = False   
        for a,b in product(*2*[list(result.keys())]): # all key pairs
            if a==b: continue
            if a not in result or b not in result: continue # keys still there?
            m,n = mergeFunction(result[a],result[b])        # call merge function
            if not m : continue                             # if merged ...
            mergedKey = "".join(sorted(set(a+b)))             # combine keys
            result[mergedKey] = m                             # add merged set
            if mergedKey != a: result[a] -= m; merging = True # clean/clear
            if not result[a]: del result[a]                   # original sets,
            if mergedKey != b: result[b] -= n; merging = True # do more merges
            if not result[b]: del result[b]
    return result

来源：https://stackoverflow.com/questions/65716614/rearranging-a-dictionary-based-on-a-function-condition-over-its-items

标签

python

dictionary

nested

set

python-3.9