Finding matching keys in two large dictionaries and doing it fast

前端 未结 11 1470
鱼传尺愫
鱼传尺愫 2020-12-15 17:35

I am trying to find corresponding keys in two different dictionaries. Each has about 600k entries.

Say for example:

    myRDP = { \'Actinobacter\':          


        
相关标签:
11条回答
  • 2020-12-15 18:17

    Copy both dictionaries into one dictionary/array. This makes sense as you have 1:1 related values. Then you need only one search, no comparison loop, and can access the related value directly.

    Example Resulting Dictionary/Array:

    [Name][Value1][Value2]
    
    [Actinobacter][GATCGA...TCA][8924342]
    
    [XYZbacter][BCABCA...ABC][43594344]
    

    ...

    0 讨论(0)
  • 2020-12-15 18:19

    Use the get method instead:

     for key in myRDP:
        value = myNames.get(key)
        if value != None:
          print key, "=", value
    
    0 讨论(0)
  • 2020-12-15 18:21

    You could do this:

    for key in myRDP:
        if key in myNames:
            print key, myNames[key]
    

    Your first attempt was slow because you were comparing every key in myRDP with every key in myNames. In algorithmic jargon, if myRDP has n elements and myNames has m elements, then that algorithm would take O(n×m) operations. For 600k elements each, this is 360,000,000,000 comparisons!

    But testing whether a particular element is a key of a dictionary is fast -- in fact, this is one of the defining characteristics of dictionaries. In algorithmic terms, the key in dict test is O(1), or constant-time. So my algorithm will take O(n) time, which is one 600,000th of the time.

    0 讨论(0)
  • 2020-12-15 18:21

    Best and easiest way would be simply perform common set operations(Python 3).

    a = {"a": 1, "b":2, "c":3, "d":4}
    b = {"t1": 1, "b":2, "e":5, "c":3}
    res = a.items() & b.items() # {('b', 2), ('c', 3)} For common Key and Value
    res = {i[0]:i[1] for i in res}  # In dict format
    common_keys = a.keys() & b.keys()  # {'b', 'c'}
    

    Cheers!

    0 讨论(0)
  • 2020-12-15 18:21

    Here is my code for doing intersections, unions, differences, and other set operations on dictionaries:

    class DictDiffer(object):
        """
        Calculate the difference between two dictionaries as:
        (1) items added
        (2) items removed
        (3) keys same in both but changed values
        (4) keys same in both and unchanged values
        """
        def __init__(self, current_dict, past_dict):
            self.current_dict, self.past_dict = current_dict, past_dict
            self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
            self.intersect = self.set_current.intersection(self.set_past)
        def added(self):
            return self.set_current - self.intersect 
        def removed(self):
            return self.set_past - self.intersect 
        def changed(self):
            return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
        def unchanged(self):
            return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])
    
    if __name__ == '__main__':
        import unittest
        class TestDictDifferNoChanged(unittest.TestCase):
            def setUp(self):
                self.past = dict((k, 2*k) for k in range(5))
                self.current = dict((k, 2*k) for k in range(3,8))
                self.d = DictDiffer(self.current, self.past)
            def testAdded(self):
                self.assertEqual(self.d.added(), set((5,6,7)))
            def testRemoved(self):      
                self.assertEqual(self.d.removed(), set((0,1,2)))
            def testChanged(self):
                self.assertEqual(self.d.changed(), set())
            def testUnchanged(self):
                self.assertEqual(self.d.unchanged(), set((3,4)))
        class TestDictDifferNoCUnchanged(unittest.TestCase):
            def setUp(self):
                self.past = dict((k, 2*k) for k in range(5))
                self.current = dict((k, 2*k+1) for k in range(3,8))
                self.d = DictDiffer(self.current, self.past)
            def testAdded(self):
                self.assertEqual(self.d.added(), set((5,6,7)))
            def testRemoved(self):      
                self.assertEqual(self.d.removed(), set((0,1,2)))
            def testChanged(self):
                self.assertEqual(self.d.changed(), set((3,4)))
            def testUnchanged(self):
                self.assertEqual(self.d.unchanged(), set())
        unittest.main()
    
    0 讨论(0)
提交回复
热议问题