How do I create a dict of dict of lists using defaultdict? I am getting the following error.
>>> from collections import defaultdict
>>> a=
Slightly faster than using a lambda
:
defaultdict(defaultdict(list).copy)
This has the same observable behavior as wim's answer, but avoids a lambda
in favor of a (in CPython) bound built-in method implemented in C, which means default value generation doesn't have to execute any Python byte code or look up any names, and it runs a small amount faster. In microbenchmarks on CPython 3.5, it looks like the cost paid when a key did not exist at access time is about 5-10% lower this way than with the otherwise equivalent lambda
.
Really, the reason I prefer it is because I hate lambda
due to people overusing it when it's a bad idea (e.g. map
/filter
with lambda
is always more verbose and slower than an equivalent listcomp/genexpr, but people keep doing it anyway for no discernible reason), even though in this case it hardly matters.
Update: As of 3.8, this performance improvement is gone, and the lambda
is faster (~3% reduced runtime using lambda
on 3.8, ~7% on 3.9), for simple microbenchmarks with ipython
. If you wish to reproduce my tests, I tested:
>>> from collections import defaultdict
>>> %%timeit dd = defaultdict(lambda: defaultdict(list)); o = object
... dd[o()]
>>> %%timeit dd = defaultdict(defaultdict(list).copy); o = object
... dd[o()]
where caching o = object
minimized lookup expenses and allowed us to make extremely cheap, guaranteed unique keys that we accessed (forcing auto-vivification of a list
) while doing no other work.
The performance improvement in 3.8 is likely largely due to the introduction of per opcode cache for the LOAD_GLOBAL instruction, reducing the cost of looking up defaultdict
and list
within the lambda
from a full dict
lookup (two in the case of list
, in built-ins) to a quick check of the version tag on the dict
followed by a cheap load from the cache, reducing the cost by ~40%. The 3.9 improvement likely (not sure on this) relates to CPython's internals moving to optimize and favor vectorcall code paths more, at the expense of non-vectorcall code paths (which the defaultdict(list).copy
path uses more of, relatively speaking), and even before these improvements, defaultdict(list).copy
had some inefficiencies that the lambda
lacked, providing some margin for improving on it.