Intersecting two dictionaries in Python

后端 未结 8 2015
借酒劲吻你
借酒劲吻你 2020-11-27 18:26

I am working on a search program over an inverted index. The index itself is a dictionary whose keys are terms and whose values are themselves dictionaries of short document

相关标签:
8条回答
  • 2020-11-27 18:36

    In Python, you use the & operator to calculate the intersection of sets, and dictionary keys are set-like objects (in Python 3):

    dict_a = {"a": 1, "b": 2}
    dict_b = {"a": 2, "c": 3} 
    
    intersection = dict_a.keys() & dict_b.keys()  # {'a'}
    

    On Python 2 you have to convert the dictionary keys to sets yourself:

    keys_a = set(dict_a.keys())
    keys_b = set(dict_b.keys())
    intersection = keys_a & keys_b
    
    0 讨论(0)
  • 2020-11-27 18:38

    A little known fact is that you don't need to construct sets to do this:

    In Python 2:

    In [78]: d1 = {'a': 1, 'b': 2}
    
    In [79]: d2 = {'b': 2, 'c': 3}
    
    In [80]: d1.viewkeys() & d2.viewkeys()
    Out[80]: {'b'}
    

    In Python 3 replace viewkeys with keys; the same applies to viewvalues and viewitems.

    From the documentation of viewitems:

    In [113]: d1.viewitems??
    Type:       builtin_function_or_method
    String Form:<built-in method viewitems of dict object at 0x64a61b0>
    Docstring:  D.viewitems() -> a set-like object providing a view on D's items
    

    For larger dicts this also slightly faster than constructing sets and then intersecting them:

    In [122]: d1 = {i: rand() for i in range(10000)}
    
    In [123]: d2 = {i: rand() for i in range(10000)}
    
    In [124]: timeit d1.viewkeys() & d2.viewkeys()
    1000 loops, best of 3: 714 µs per loop
    
    In [125]: %%timeit
    s1 = set(d1)
    s2 = set(d2)
    res = s1 & s2
    
    1000 loops, best of 3: 805 µs per loop
    
    For smaller `dict`s `set` construction is faster:
    
    In [126]: d1 = {'a': 1, 'b': 2}
    
    In [127]: d2 = {'b': 2, 'c': 3}
    
    In [128]: timeit d1.viewkeys() & d2.viewkeys()
    1000000 loops, best of 3: 591 ns per loop
    
    In [129]: %%timeit
    s1 = set(d1)
    s2 = set(d2)
    res = s1 & s2
    
    1000000 loops, best of 3: 477 ns per loop
    

    We're comparing nanoseconds here, which may or may not matter to you. In any case, you get back a set, so using viewkeys/keys eliminates a bit of clutter.

    0 讨论(0)
  • 2020-11-27 18:42
    In [1]: d1 = {'a':1, 'b':4, 'f':3}
    
    In [2]: d2 = {'a':1, 'b':4, 'd':2}
    
    In [3]: d = {x:d1[x] for x in d1 if x in d2}
    
    In [4]: d
    Out[4]: {'a': 1, 'b': 4}
    
    0 讨论(0)
  • 2020-11-27 18:48

    Just wrap the dictionary instances with a simple class that gets both of the values you want

    class DictionaryIntersection(object):
        def __init__(self,dictA,dictB):
            self.dictA = dictA
            self.dictB = dictB
    
        def __getitem__(self,attr):
            if attr not in self.dictA or attr not in self.dictB:
                raise KeyError('Not in both dictionaries,key: %s' % attr)
    
            return self.dictA[attr],self.dictB[attr]
    
    x = {'foo' : 5, 'bar' :6}
    y = {'bar' : 'meow' , 'qux' : 8}
    
    z = DictionaryIntersection(x,y)
    
    print z['bar']
    
    0 讨论(0)
  • 2020-11-27 18:49

    In Python 3, you can use

    intersection = dict(dict1.items() & dict2.items())
    union = dict(dict1.items() | dict2.items())
    difference = dict(dict1.items() ^ dict2.items())
    
    0 讨论(0)
  • 2020-11-27 18:50

    Okay, here is a generalized version of code above in Python3. It is optimized to use comprehensions and set-like dict views which are fast enough.

    Function intersects arbitrary many dicts and returns a dict with common keys and a set of common values for each common key:

    def dict_intersect(*dicts):
        comm_keys = dicts[0].keys()
        for d in dicts[1:]:
            # intersect keys first
            comm_keys &= d.keys()
        # then build a result dict with nested comprehension
        result = {key:{d[key] for d in dicts} for key in comm_keys}
        return result
    

    Usage example:

    a = {1: 'ba', 2: 'boon', 3: 'spam', 4:'eggs'}
    b = {1: 'ham', 2:'baboon', 3: 'sausages'}
    c = {1: 'more eggs', 3: 'cabbage'}
    
    res = dict_intersect(a, b, c)
    # Here is res (the order of values may vary) :
    # {1: {'ham', 'more eggs', 'ba'}, 3: {'spam', 'sausages', 'cabbage'}}
    

    Here the dict values must be hashable, if they aren't you could simply change set parentheses { } to list [ ]:

    result = {key:[d[key] for d in dicts] for key in comm_keys}
    
    0 讨论(0)
提交回复
热议问题