Short answer:
- In python 2: your assumption is correct:
dict.keys()
slows down.
- In python 3: your assumption is not correct:
in dict.keys()
performs like like in dict
Details for py2 and py3 follow below.
Python 2.7 answer:
Your assumption is correct, for the following reasons:
dict.keys()
involves an additional function call (stack overhead).
dict.keys()
returns a list, which contains all keys in memory (as opposed to a lazy generator object). Hence it needs to allocate memory.
key in dict
can use a set object internally, which is an indexed lookup. key in dict.keys()
is a linear search in a list
I created a small benckmark script to show my point:
#! /usr/bin/python2.7
import datetime as dt
import random
dict_size = 1000000
num_iterations = 100
d = {i: i for i in xrange(dict_size)}
def f():
k = random.randint(0,dict_size-1)
return (k in d)
def g():
k = random.randint(0,dict_size-1)
return (k in d.keys())
def test(func):
t = dt.datetime.utcnow()
for i in xrange(num_iterations):
func()
print "%s --> %1.6f s" % (func, (dt.datetime.utcnow()-t).total_seconds())
test(f)
test(g)
Output (python 2.7.6 Ubuntu 14.04):
<function f at 0x7ff2e0126d70> --> 0.000598 s
<function g at 0x7ff2e0126de8> --> 5.191553 s
I also tested with the nr of iterations and the dict size swapped (dict of only 100 items, 1M iterations).
<function f at 0x7f94cb5e6d70> --> 3.614162 s
<function g at 0x7f94cb5e6de8> --> 7.007922 s
Here the results are much closer together.
So the performance difference really grows with the size of the dict.
Python 3 answer:
I adapted the script for python 3:
xrange
is gone, use range
instead. (not in the performance-critical inner loop of the test function, so limited performance impact)
- use braces with
print
- change shabang line to
#!/usr/bin/python3
And tested with python 3.4.3 on the same machine.
dict_size = 1000000; num_iterations = 100
f --> 0.000590 s
g --> 0.000565 s
dict_size = 100; num_iterations = 1000000
f --> 4.525487 s
g --> 4.837232 s
So in python 3, the performance difference is gone.