How to merge dictionaries of dictionaries?

后端 未结 29 2767
渐次进展
渐次进展 2020-11-22 05:13

I need to merge multiple dictionaries, here\'s what I have for instance:

dict1 = {1:{\"a\":{A}}, 2:{\"b\":{B}}}

dict2 = {2:{\"c\":{C}}, 3:{\"d\":{D}}


        
相关标签:
29条回答
  • 2020-11-22 06:12

    Dictionaries of dictionaries merge

    As this is the canonical question (in spite of certain non-generalities) I'm providing the canonical Pythonic approach to solving this issue.

    Simplest Case: "leaves are nested dicts that end in empty dicts":

    d1 = {'a': {1: {'foo': {}}, 2: {}}}
    d2 = {'a': {1: {}, 2: {'bar': {}}}}
    d3 = {'b': {3: {'baz': {}}}}
    d4 = {'a': {1: {'quux': {}}}}
    

    This is the simplest case for recursion, and I would recommend two naive approaches:

    def rec_merge1(d1, d2):
        '''return new merged dict of dicts'''
        for k, v in d1.items(): # in Python 2, use .iteritems()!
            if k in d2:
                d2[k] = rec_merge1(v, d2[k])
        d3 = d1.copy()
        d3.update(d2)
        return d3
    
    def rec_merge2(d1, d2):
        '''update first dict with second recursively'''
        for k, v in d1.items(): # in Python 2, use .iteritems()!
            if k in d2:
                d2[k] = rec_merge2(v, d2[k])
        d1.update(d2)
        return d1
    

    I believe I would prefer the second to the first, but keep in mind that the original state of the first would have to be rebuilt from its origin. Here's the usage:

    >>> from functools import reduce # only required for Python 3.
    >>> reduce(rec_merge1, (d1, d2, d3, d4))
    {'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
    >>> reduce(rec_merge2, (d1, d2, d3, d4))
    {'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
    

    Complex Case: "leaves are of any other type:"

    So if they end in dicts, it's a simple case of merging the end empty dicts. If not, it's not so trivial. If strings, how do you merge them? Sets can be updated similarly, so we could give that treatment, but we lose the order in which they were merged. So does order matter?

    So in lieu of more information, the simplest approach will be to give them the standard update treatment if both values are not dicts: i.e. the second dict's value will overwrite the first, even if the second dict's value is None and the first's value is a dict with a lot of info.

    d1 = {'a': {1: 'foo', 2: None}}
    d2 = {'a': {1: None, 2: 'bar'}}
    d3 = {'b': {3: 'baz'}}
    d4 = {'a': {1: 'quux'}}
    
    from collections.abc import MutableMapping
    
    def rec_merge(d1, d2):
        '''
        Update two dicts of dicts recursively, 
        if either mapping has leaves that are non-dicts, 
        the second's leaf overwrites the first's.
        '''
        for k, v in d1.items():
            if k in d2:
                # this next check is the only difference!
                if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                    d2[k] = rec_merge(v, d2[k])
                # we could further check types and merge as appropriate here.
        d3 = d1.copy()
        d3.update(d2)
        return d3
    

    And now

    from functools import reduce
    reduce(rec_merge, (d1, d2, d3, d4))
    

    returns

    {'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}
    

    Application to the original question:

    I've had to remove the curly braces around the letters and put them in single quotes for this to be legit Python (else they would be set literals in Python 2.7+) as well as append a missing brace:

    dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
    dict2 = {2:{"c":'C'}, 3:{"d":'D'}}
    

    and rec_merge(dict1, dict2) now returns:

    {1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
    

    Which matches the desired outcome of the original question (after changing, e.g. the {A} to 'A'.)

    0 讨论(0)
  • 2020-11-22 06:12

    Based on answers from @andrew cooke. It takes care of nested lists in a better way.

    def deep_merge_lists(original, incoming):
        """
        Deep merge two lists. Modifies original.
        Recursively call deep merge on each correlated element of list. 
        If item type in both elements are
         a. dict: Call deep_merge_dicts on both values.
         b. list: Recursively call deep_merge_lists on both values.
         c. any other type: Value is overridden.
         d. conflicting types: Value is overridden.
    
        If length of incoming list is more that of original then extra values are appended.
        """
        common_length = min(len(original), len(incoming))
        for idx in range(common_length):
            if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
                deep_merge_dicts(original[idx], incoming[idx])
    
            elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
                deep_merge_lists(original[idx], incoming[idx])
    
            else:
                original[idx] = incoming[idx]
    
        for idx in range(common_length, len(incoming)):
            original.append(incoming[idx])
    
    
    def deep_merge_dicts(original, incoming):
        """
        Deep merge two dictionaries. Modifies original.
        For key conflicts if both values are:
         a. dict: Recursively call deep_merge_dicts on both values.
         b. list: Call deep_merge_lists on both values.
         c. any other type: Value is overridden.
         d. conflicting types: Value is overridden.
    
        """
        for key in incoming:
            if key in original:
                if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                    deep_merge_dicts(original[key], incoming[key])
    
                elif isinstance(original[key], list) and isinstance(incoming[key], list):
                    deep_merge_lists(original[key], incoming[key])
    
                else:
                    original[key] = incoming[key]
            else:
                original[key] = incoming[key]
    
    0 讨论(0)
  • 2020-11-22 06:12

    Since dictviews support set operations, I was able to greatly simplify jterrace's answer.

    def merge(dict1, dict2):
        for k in dict1.keys() - dict2.keys():
            yield (k, dict1[k])
    
        for k in dict2.keys() - dict1.keys():
            yield (k, dict2[k])
    
        for k in dict1.keys() & dict2.keys():
            yield (k, dict(merge(dict1[k], dict2[k])))
    

    Any attempt to combine a dict with a non dict (technically, an object with a 'keys' method and an object without a 'keys' method) will raise an AttributeError. This includes both the initial call to the function and recursive calls. This is exactly what I wanted so I left it. You could easily catch an AttributeErrors thrown by the recursive call and then yield any value you please.

    0 讨论(0)
  • 2020-11-22 06:12
    from collections import defaultdict
    from itertools import chain
    
    class DictHelper:
    
    @staticmethod
    def merge_dictionaries(*dictionaries, override=True):
        merged_dict = defaultdict(set)
        all_unique_keys = set(chain(*[list(dictionary.keys()) for dictionary in dictionaries]))  # Build a set using all dict keys
        for key in all_unique_keys:
            keys_value_type = list(set(filter(lambda obj_type: obj_type != type(None), [type(dictionary.get(key, None)) for dictionary in dictionaries])))
            # Establish the object type for each key, return None if key is not present in dict and remove None from final result
            if len(keys_value_type) != 1:
                raise Exception("Different objects type for same key: {keys_value_type}".format(keys_value_type=keys_value_type))
    
            if keys_value_type[0] == list:
                values = list(chain(*[dictionary.get(key, []) for dictionary in dictionaries]))  # Extract the value for each key
                merged_dict[key].update(values)
    
            elif keys_value_type[0] == dict:
                # Extract all dictionaries by key and enter in recursion
                dicts_to_merge = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
                merged_dict[key] = DictHelper.merge_dictionaries(*dicts_to_merge)
    
            else:
                # if override => get value from last dictionary else make a list of all values
                values = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
                merged_dict[key] = values[-1] if override else values
    
        return dict(merged_dict)
    
    
    
    if __name__ == '__main__':
      d1 = {'aaaaaaaaa': ['to short', 'to long'], 'bbbbb': ['to short', 'to long'], "cccccc": ["the is a test"]}
      d2 = {'aaaaaaaaa': ['field is not a bool'], 'bbbbb': ['field is not a bool']}
      d3 = {'aaaaaaaaa': ['filed is not a string', "to short"], 'bbbbb': ['field is not an integer']}
      print(DictHelper.merge_dictionaries(d1, d2, d3))
    
      d4 = {"a": {"x": 1, "y": 2, "z": 3, "d": {"x1": 10}}}
      d5 = {"a": {"x": 10, "y": 20, "d": {"x2": 20}}}
      print(DictHelper.merge_dictionaries(d4, d5))
    

    Output:

    {'bbbbb': {'to long', 'field is not an integer', 'to short', 'field is not a bool'}, 
    'aaaaaaaaa': {'to long', 'to short', 'filed is not a string', 'field is not a bool'}, 
    'cccccc': {'the is a test'}}
    
    {'a': {'y': 20, 'd': {'x1': 10, 'x2': 20}, 'z': 3, 'x': 10}}
    
    0 讨论(0)
  • 2020-11-22 06:13

    I've been testing your solutions and decided to use this one in my project:

    def mergedicts(dict1, dict2, conflict, no_conflict):
        for k in set(dict1.keys()).union(dict2.keys()):
            if k in dict1 and k in dict2:
                yield (k, conflict(dict1[k], dict2[k]))
            elif k in dict1:
                yield (k, no_conflict(dict1[k]))
            else:
                yield (k, no_conflict(dict2[k]))
    
    dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
    dict2 = {2:{"c":"C"}, 3:{"d":"D"}}
    
    #this helper function allows for recursion and the use of reduce
    def f2(x, y):
        return dict(mergedicts(x, y, f2, lambda x: x))
    
    print dict(mergedicts(dict1, dict2, f2, lambda x: x))
    print dict(reduce(f2, [dict1, dict2]))
    

    Passing functions as parameteres is key to extend jterrace solution to behave as all the other recursive solutions.

    0 讨论(0)
  • 2020-11-22 06:14

    One issue with this question is that the values of the dict can be arbitrarily complex pieces of data. Based upon these and other answers I came up with this code:

    class YamlReaderError(Exception):
        pass
    
    def data_merge(a, b):
        """merges b into a and return merged result
    
        NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
        key = None
        # ## debug output
        # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
        try:
            if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
                # border case for first run or if a is a primitive
                a = b
            elif isinstance(a, list):
                # lists can be only appended
                if isinstance(b, list):
                    # merge lists
                    a.extend(b)
                else:
                    # append to list
                    a.append(b)
            elif isinstance(a, dict):
                # dicts must be merged
                if isinstance(b, dict):
                    for key in b:
                        if key in a:
                            a[key] = data_merge(a[key], b[key])
                        else:
                            a[key] = b[key]
                else:
                    raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
            else:
                raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
        except TypeError, e:
            raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
        return a
    

    My use case is merging YAML files where I only have to deal with a subset of possible data types. Hence I can ignore tuples and other objects. For me a sensible merge logic means

    • replace scalars
    • append lists
    • merge dicts by adding missing keys and updating existing keys

    Everything else and the unforeseens results in an error.

    0 讨论(0)
提交回复
热议问题