How to avoid computation every time a python module is reloaded

后端未结

关注

 13  728

I have a python module that makes use of a huge dictionary global variable, currently I put the computation code in the top section, every first time import or reload of the mod

相关标签:

13条回答

感情败类

2021-02-06 11:10
If the 'shelve' solution turns out to be too slow or fiddly, there are other possibilities:
- shove
- Durus
- ZopeDB
- pyTables
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2021-02-06 11:13
1. Factor the computationally intensive part into a separate module. Then at least on reload, you won't have to wait.
2. Try dumping the data structure using protocol 2. The command to try would be cPickle.dump(FD, protocol=2). From the docstring for cPickle.Pickler:
```
Protocol 0 is the
only protocol that can be written to a file opened in text
mode and read back successfully.  When using a protocol higher
than 0, make sure the file is opened in binary mode, both when
pickling and unpickling. 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2021-02-06 11:15
I'm going through this same issue... shelve, databases, etc... are all too slow for this type of problem. You'll need to take the hit once, insert it into an inmemory key/val store like Redis. It will just live there in memory (warning it could use up a good amount of memory so you may want a dedicated box). You'll never have to reload it and you'll just get looking in memory for keys
```
r = Redis()
r.set(key, word)

word = r.get(key)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2021-02-06 11:18

OR you could just use a database for storing the values in? Check out SQLObject, which makes it very easy to store stuff to a database.

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2021-02-06 11:20
Just to clarify: the code in the body of a module is not executed every time the module is imported - it is run only once, after which future imports find the already created module, rather than recreating it. Take a look at sys.modules to see the list of cached modules.

However, if your problem is the time it takes for the first import after the program is run, you'll probably need to use some other method than a python dict. Probably best would be to use an on-disk form, for instance a sqlite database, one of the dbm modules.

For a minimal change in your interface, the shelve module may be your best option - this puts a pretty transparent interface between the dbm modules that makes them act like an arbitrary python dict, allowing any picklable value to be stored. Here's an example:
```
# Create dict with a million items:
import shelve
d = shelve.open('path/to/my_persistant_dict')
d.update(('key%d' % x, x) for x in xrange(1000000))
d.close()
```
Then in the next process, use it. There should be no large delay, as lookups are only performed for the key requested on the on-disk form, so everything doesn't have to get loaded into memory:
```
>>> d = shelve.open('path/to/my_persistant_dict')
>>> print d['key99999']
99999
```
It's a bit slower than a real dict, and it will still take a long time to load if you do something that requires all the keys (eg. try to print it), but may solve your problem.
0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2021-02-06 11:20

shelve gets really slow with large data sets. I've been using redis quite successfully, and wrote a FreqDist wrapper around it. It's very fast, and can be accessed concurrently.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页