How to avoid computation every time a python module is reloaded

后端 未结 13 702
温柔的废话
温柔的废话 2021-02-06 10:55

I have a python module that makes use of a huge dictionary global variable, currently I put the computation code in the top section, every first time import or reload of the mod

相关标签:
13条回答
  • 2021-02-06 11:21

    You could try using the marshal module instead of the c?Pickle one; it could be faster. This module is used by python to store values in a binary format. Note especially the following paragraph, to see if marshal fits your needs:

    Not all Python object types are supported; in general, only objects whose value is independent from a particular invocation of Python can be written and read by this module. The following types are supported: None, integers, long integers, floating point numbers, strings, Unicode objects, tuples, lists, sets, dictionaries, and code objects, where it should be understood that tuples, lists and dictionaries are only supported as long as the values contained therein are themselves supported; and recursive lists and dictionaries should not be written (they will cause infinite loops).

    Just to be on the safe side, before unmarshalling the dict, make sure that the Python version that unmarshals the dict is the same as the one that did the marshal, since there are no guarantees for backwards compatibility.

    0 讨论(0)
  • 2021-02-06 11:22

    Expanding on the delayed-calculation idea, why not turn the dict into a class that supplies (and caches) elements as necessary?

    You might also use psyco to speed up overall execution...

    0 讨论(0)
  • 2021-02-06 11:23

    I assume you've pasted the dict literal into the source, and that's what's taking a minute? I don't know how to get around that, but you could probably avoid instantiating this dict upon import... You could lazily-instantiate it the first time it's actually used.

    0 讨论(0)
  • 2021-02-06 11:26

    There's another pretty obvious solution for this problem. When code is reloaded the original scope is still available.

    So... doing something like this will make sure this code is executed only once.

    try:
        FD
    except NameError:
        FD = FreqDist(word for word in brown.words())
    
    0 讨论(0)
  • 2021-02-06 11:31

    A couple of things that will help speed up imports:

    1. You might try running python using the -OO flag when running python. This will do some optimizations that will reduce import time of modules.
    2. Is there any reason why you couldn't break the dictionary up into smaller dictionaries in separate modules that can be loaded more quickly?
    3. As a last resort, you could do the calculations asynchronously so that they won't delay your program until it needs the results. Or maybe even put the dictionary in a separate process and pass data back and forth using IPC if you want to take advantage of multi-core architectures.

    With that said, I agree that you shouldn't be experiencing any delay in importing modules after the first time you import it. Here are a couple of other general thoughts:

    1. Are you importing the module within a function? If so, this can lead to performance problems since it has to check and see if the module is loaded every time it hits the import statement.
    2. Is your program multi-threaded? I have seen occassions where executing code upon module import in a multi-threaded app can cause some wonkiness and application instability (most notably with the cgitb module).
    3. If this is a global variable, be aware that global variable lookup times can be significantly longer than local variable lookup times. In this case, you can achieve a significant performance improvement by binding the dictionary to a local variable if you're using it multiple times in the same context.

    With that said, it's a tad bit difficult to give you any specific advice without a little bit more context. More specifically, where are you importing it? And what are the computations?

    0 讨论(0)
  • 2021-02-06 11:36

    Calculate your global var on the first use.

    class Proxy:
        @property
        def global_name(self):
            # calculate your global var here, enable cache if needed
            ...
    
    _proxy_object = Proxy()
    GLOBAL_NAME = _proxy_object.global_name
    

    Or better yet, access necessery data via special data object.

    class Data:
        GLOBAL_NAME = property(...)
    
    data = Data()
    

    Example:

    from some_module import data
    
    print(data.GLOBAL_NAME)
    

    See Django settings.

    0 讨论(0)
提交回复
热议问题