python key in dict.keys() performance for large dictionaries

后端 未结 5 1029
执念已碎
执念已碎 2020-12-09 20:22

I was wondering if you guys might be able to give me some advice in regards to making the performance of my code much better.

I have a set of for loops which look to

相关标签:
5条回答
  • 2020-12-09 20:53
    your_dict.setdefault(value.key, []).append(value.val)
    
    0 讨论(0)
  • 2020-12-09 20:58

    Step 1: we transform the code using the temp_list into a single expression (I assume temp_list isn't needed outside this code), by using addition instead of the append method. Also, we don't need to use dict.keys() explicitly, as others mentioned (and in fact it wastes a huge amount of time).

    for value in value_list:
       if value.key in dict:
          dict[value.key] = dict[value.key] + [value.val]
       else:
          dict[value.key] = [value.val]
    

    Step 2: Transform the assignments-to-the-same-location by using the conditional expression syntax.

    for value in value_list:
       dict[value.key] = dict[value.key] + [value.val] if value.key in dict else [value.val]
    

    Step 3: Appending or prepending an empty list has no effect on the value of a list, so we can insert that, and then factor out the common 'addition' of the value.

    for value in value_list:
       dict[value.key] = (dict[value.key] if value.key in dict else []) + [value.val]
    

    Step 4: Recognize that the dict has built-in functionality for providing a 'default' value when the key is absent:

    for value in value_list:
       dict[value.key] = dict.get(value.key, []) + [value.val]
    

    Step 5: Instead of getting a value, modifying it and setting it back, we can use .setdefault to give us the current contents (or set them up if not already there), and then switch back to using .append to modify the list:

    for value in value_list:
       dict.setdefault(value.key, []).append(value.val)
    

    (I mean... I could have just looked at it and thought for a bit and arrived at this, but seeing each step makes it clearer where we're going...)

    0 讨论(0)
  • 2020-12-09 20:59
    if value.key in dict.keys():
    

    Is very expensive because you're converting to a list of keys and then searching the list. Just replacing that with:

    if value.key in dict:
    

    Should shorten the search to ~log N (EDIT: I stand corrected by Glenn, probably even faster because the Python dictionaries use a hash table). Then simply:

    dict[key].append(value.val)
    

    Should speed things up a bit. Using a temporary is not required and just eats some CPU cycles.

    If you can give more details about what you're trying to do someone may be able to suggest a better algorithm.

    0 讨论(0)
  • 2020-12-09 21:00

    Don't do this:

    value.key in dict.keys()
    

    That--in Python 2, at least--creates a list containing every key. That gets more and more expensive as the dictionary gets larger, and performs an O(n) search on the list to find the key, which defeats the purpose of using a dict.

    Instead, just do:

    value.key in dict
    

    which doesn't create a temporary list, and does a hash table lookup for the key rather than a linear search.

    setdefault, as mentioned elsewhere, is the cleaner way to do this, but it's very important to understand the above.

    0 讨论(0)
  • 2020-12-09 21:03

    Using collections.defaultdict, this can be simplified to

    d = collections.defaultdict(list)
    for value in value_list:
        d[value.key].append(value.val)
    
    0 讨论(0)
提交回复
热议问题