More specific dupe of 875228—Simple data storing in Python.
I have a rather large dict (6 GB) and I need to do some processing on it. I\'m trying out several docume
This solution at SourceForge uses only standard Python modules:
y_serial.py module :: warehouse Python objects with SQLite
"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."
http://yserial.sourceforge.net
The compression bonus will probably reduce your 6GB dictionary to 1GB. If you do not want a store a series of dictionaries, the module also contains a file.gz solution which might be more suitable given your dictionary size.
I'd use shelve, json
, yaml
, or whatever, as suggested by other answers.
shelve
is specially cool because you can have the dict
on disk and still use it. Values will be loaded on-demand.
But if you really want to parse the text of the dict
, and it contains only str
ings, int
s and tuple
s like you've shown, you can use ast.literal_eval to parse it. It is a lot safer, since you can't eval full expressions with it - It only works with str
ings, numbers, tuple
s, list
s, dict
s, bool
eans, and None
:
>>> import ast
>>> print ast.literal_eval("{12: 'mydict', 14: (1, 2, 3)}")
{12: 'mydict', 14: (1, 2, 3)}
Write it out in a serialized format, such as pickle (a python standard library module for serialization) or perhaps by using JSON (which is a representation that can be evaled to produce the memory representation again).
Why not use python pickle? Python has a great serializing module called pickle it is very easy to use.
import cPickle
cPickle.dump(obj, open('save.p', 'wb'))
obj = cPickle.load(open('save.p', 'rb'))
There are two disadvantages with pickle:
If you are using python 2.6 there is a builtin module called json. It is as easy as pickle to use:
import json
encoded = json.dumps(obj)
obj = json.loads(encoded)
Json format is human readable and is very similar to the dictionary string representation in python. And doesn't have any security issues like pickle. But might be slower than cPickle.
I would suggest that you use YAML for your file format so you can tinker with it on the disc
How does it look:
- It is indent based
- It can represent dictionaries and lists
- It is easy for humans to understand
An example: This block of code is an example of YAML (a dict holding a list and a string)
Full syntax: http://www.yaml.org/refcard.html
To get it in python, just easy_install pyyaml. See http://pyyaml.org/
It comes with easy file save / load functions, that I can't remember right this minute.
Here are a few alternatives depending on your requirements:
numpy
stores your plain data in a compact form and performs group/mass operations well
shelve
is like a large dict backed up by a file
some 3rd party storage module, e.g. stash
, stores arbitrary plain data
proper database, e.g. mongodb for hairy data or mysql or sqlite plain data and faster retrieval