python threadsafe object cache

前端未结

关注

 6  930

I have implemented a python webserver. Each http request spawns a new thread. I have a requirement of caching objects in memory and since its a webserver, I want the cache t

相关标签:

6条回答

旧巷少年郎

2020-12-29 09:56

Thread per request is often a bad idea. If your server experiences huge spikes in load it will take the box to its knees. Consider using a thread pool that can grow to a limited size during peak usage and shrink to a smaller size when load is light.

0 讨论(0)
发布评论:

提交评论
- 加载中...
借酒劲吻你

2020-12-29 09:56

You probably want to use memcached instead. It's very fast, very stable, very popular, has good python libraries, and will allow you to grow to a distributed cache should you need to:

http://www.danga.com/memcached/

0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2020-12-29 10:01

Well a lot of operations in Python are thread-safe by default, so a standard dictionary should be ok (at least in certain respects). This is mostly due to the GIL, which will help avoid some of the more serious threading issues.

There's a list here: http://coreygoldberg.blogspot.com/2008/09/python-thread-synchronization-and.html that might be useful.

Though atomic nature of those operation just means that you won't have an entirely inconsistent state if you have two threads accessing a dictionary at the same time. So you wouldn't have a corrupted value. However you would (as with most multi-threading programming) not be able to rely on the specific order of those atomic operations.

So to cut a long story short...

If you have fairly simple requirements and aren't to bothered about the ordering of what get written into the cache then you can use a dictionary and know that you'll always get a consistent/not-corrupted value (it just might be out of date).

If you want to ensure that things are a bit more consistent with regard to reading and writing then you might want to look at Django's local memory cache:

http://code.djangoproject.com/browser/django/trunk/django/core/cache/backends/locmem.py

Which uses a read/write lock for locking.

0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2020-12-29 10:11

I'm not sure any of these answers are doing what you want.

I have a similar problem and I'm using a drop-in replacement for lrucache called cachetools which allows you to pass in a lock to make it a bit safer.

0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-12-29 10:14
For a thread safe object you want threading.local:
```
from threading import local

safe = local()

safe.cache = {}
```
You can then put and retrieve objects in safe.cache with thread safety.
0 讨论(0)
发布评论:

提交评论
- 加载中...
小蘑菇

2020-12-29 10:17
Point 1. GIL does not help you here, an example of a (non-thread-safe) cache for something called "stubs" would be
```
stubs = {}

def maybe_new_stub(host):
    """ returns stub from cache and populates the stubs cache if new is created """
    if host not in stubs:
        stub = create_new_stub_for_host(host)
        stubs[host] = stub
    return stubs[host]
```
What can happen is that Thread 1 calls maybe_new_stub('localhost'), and it discovers we do not have that key in the cache yet. Now we switch to Thread 2, which calls the same maybe_new_stub('localhost'), and it also learns the key is not present. Consequently, both threads call create_new_stub_for_host and put it into the cache.

The map itself is protected by the GIL, so we cannot break it by concurrent access. The logic of the cache, however, is not protected, and so we may end up creating two or more stubs, and dropping all except one on the floor.

Point 2. Depending on the nature of the program, you may not want a global cache. Such shared cache forces synchronization between all your threads. For performance reasons, it is good to make the threads as independent as possible. I believe I do need it, you may actually not.

Point 3. You may use a simple lock. I took inspiration from https://codereview.stackexchange.com/questions/160277/implementing-a-thread-safe-lrucache and came up with the following, which I believe is safe to use for my purposes
```
import threading

stubs = {}
lock = threading.Lock()


def maybe_new_stub(host):
    """ returns stub from cache and populates the stubs cache if new is created """
    with lock:
        if host not in stubs:
            channel = grpc.insecure_channel('%s:6666' % host)
            stub = cli_pb2_grpc.BrkStub(channel)
            stubs[host] = stub
        return stubs[host]
```
Point 4. It would be best to use existing library. I haven't found any I am prepared to vouch for yet.
0 讨论(0)
发布评论:

提交评论
- 加载中...