I have not worked with threading in Python at all and asking this question as a complete stranger.
I am wondering if defaultdict
is thread-safe. Let me explain it:
I have
d = defaultdict(list)
which creates a list for missing keys by default. Let's say I have multiple threads started doing this at the same time:
d['key'].append('value')
At the end, I'm supposed to end up with ['value', 'value']
. However, if the defaultdict
is not thread-safe, if the thread 1 yields to thread 2 after checking if 'key' in dict
and before d['key'] = default_factory()
, it will cause interleaving, and the other thread will create list in d['key']
and append 'value'
maybe.
Then when thread 1 is executing again, it will continue from d['key'] = default_factory()
which will destroy the existing list and value, and we will end up in ['key']
.
I looked at CPython source code for defaultdict. However, I could not find any locks or mutexes. I guess it is not thread-safe as long as it is documented so.
Some guys last night on IRC said that there is GIL on Python, so it is conceptually thread-safe. Some said threading should not be done in Python. I'm pretty confused. Ideas?
It is thread safe, in this specific case.
To know why it is important to understand when Python switches threads. CPython only allows switching between threads between Python bytecode steps. This is where the GIL comes in; every N byte code instructions the lock is released and a thread switch can take place.
The d['key']
code is handled by one bytecode (BINARY_SUBSCR
) that triggers the .__getitem__()
method to be called on the dictionary.
A defaultdict
, configured with list
as the default value factory, and using string values as keys, handles the dict.__getitem__()
method entirely in C, and the GIL is never unlocked, making dict[key]
lookups thread safe.
Note the qualification there; if you create a defaultdict
instance with a different default-value factory, one that uses Python code (lambda: [1, 2, 3]
for example), all bets are off as that means the C code calls back into Python code and the GIL can be released again while executing the bytecode for the lambda
function. The same applies to the keys, when using an object that implements either __hash__
or __eq__
in Python code then a thread switch can take place there. Next, if the factory is written in C code that explicitly releases the GIL, a thread switch can take place and thread safety is out the window.
来源:https://stackoverflow.com/questions/17682484/is-collections-defaultdict-thread-safe