python threadsafe object cache

前端 未结 6 930
感动是毒
感动是毒 2020-12-29 09:29

I have implemented a python webserver. Each http request spawns a new thread. I have a requirement of caching objects in memory and since its a webserver, I want the cache t

相关标签:
6条回答
  • 2020-12-29 09:56

    Thread per request is often a bad idea. If your server experiences huge spikes in load it will take the box to its knees. Consider using a thread pool that can grow to a limited size during peak usage and shrink to a smaller size when load is light.

    0 讨论(0)
  • 2020-12-29 09:56

    You probably want to use memcached instead. It's very fast, very stable, very popular, has good python libraries, and will allow you to grow to a distributed cache should you need to:

    http://www.danga.com/memcached/

    0 讨论(0)
  • 2020-12-29 10:01

    Well a lot of operations in Python are thread-safe by default, so a standard dictionary should be ok (at least in certain respects). This is mostly due to the GIL, which will help avoid some of the more serious threading issues.

    There's a list here: http://coreygoldberg.blogspot.com/2008/09/python-thread-synchronization-and.html that might be useful.

    Though atomic nature of those operation just means that you won't have an entirely inconsistent state if you have two threads accessing a dictionary at the same time. So you wouldn't have a corrupted value. However you would (as with most multi-threading programming) not be able to rely on the specific order of those atomic operations.

    So to cut a long story short...

    If you have fairly simple requirements and aren't to bothered about the ordering of what get written into the cache then you can use a dictionary and know that you'll always get a consistent/not-corrupted value (it just might be out of date).

    If you want to ensure that things are a bit more consistent with regard to reading and writing then you might want to look at Django's local memory cache:

    http://code.djangoproject.com/browser/django/trunk/django/core/cache/backends/locmem.py

    Which uses a read/write lock for locking.

    0 讨论(0)
  • 2020-12-29 10:11

    I'm not sure any of these answers are doing what you want.

    I have a similar problem and I'm using a drop-in replacement for lrucache called cachetools which allows you to pass in a lock to make it a bit safer.

    0 讨论(0)
  • 2020-12-29 10:14

    For a thread safe object you want threading.local:

    from threading import local
    
    safe = local()
    
    safe.cache = {}
    

    You can then put and retrieve objects in safe.cache with thread safety.

    0 讨论(0)
  • 2020-12-29 10:17

    Point 1. GIL does not help you here, an example of a (non-thread-safe) cache for something called "stubs" would be

    stubs = {}
    
    def maybe_new_stub(host):
        """ returns stub from cache and populates the stubs cache if new is created """
        if host not in stubs:
            stub = create_new_stub_for_host(host)
            stubs[host] = stub
        return stubs[host]
    

    What can happen is that Thread 1 calls maybe_new_stub('localhost'), and it discovers we do not have that key in the cache yet. Now we switch to Thread 2, which calls the same maybe_new_stub('localhost'), and it also learns the key is not present. Consequently, both threads call create_new_stub_for_host and put it into the cache.

    The map itself is protected by the GIL, so we cannot break it by concurrent access. The logic of the cache, however, is not protected, and so we may end up creating two or more stubs, and dropping all except one on the floor.

    Point 2. Depending on the nature of the program, you may not want a global cache. Such shared cache forces synchronization between all your threads. For performance reasons, it is good to make the threads as independent as possible. I believe I do need it, you may actually not.

    Point 3. You may use a simple lock. I took inspiration from https://codereview.stackexchange.com/questions/160277/implementing-a-thread-safe-lrucache and came up with the following, which I believe is safe to use for my purposes

    import threading
    
    stubs = {}
    lock = threading.Lock()
    
    
    def maybe_new_stub(host):
        """ returns stub from cache and populates the stubs cache if new is created """
        with lock:
            if host not in stubs:
                channel = grpc.insecure_channel('%s:6666' % host)
                stub = cli_pb2_grpc.BrkStub(channel)
                stubs[host] = stub
            return stubs[host]
    

    Point 4. It would be best to use existing library. I haven't found any I am prepared to vouch for yet.

    0 讨论(0)
提交回复
热议问题