I have data stored in a shelf file created with python 2.7
When I try to access the file from python 3.4, I get an error:
>>> import shelve
>
Edited: You may need to rename your database. Read on...
Seems like pickle
is not the culprit here. shelve
relies also in anydbm
(Python 2.x) or dbm
(Python 3) to create/open a database and store the pickled information.
I created (manually) a database file using the following:
# Python 2.7
import anydbm
anydbm.open('database2', flag='c')
and
# Python 3.4
import dbm
dbm.open('database3', flag='c')
In both cases, it creates the same kind of database (may be distribution dependent, this is on Debian 7):
$ file *
database2: Berkeley DB (Hash, version 9, native byte-order)
database3.db: Berkeley DB (Hash, version 9, native byte-order)
anydbm
can open database3.db
without problems, as expected:
>>> anydbm.open('database3')
<dbm.dbm object at 0x7fb1089900f0>
Notice the lack of .db
when specifying the database name, though. But dbm
chokes on database2
, which is weird:
>>> dbm.open('database2')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/dbm/__init__.py", line 88, in open
raise error[0]("db type could not be determined")
dbm.error: db type could not be determined
unless I change the name of the name of the database to database2.db
:
$ mv database2 database2.db
$ python3
>>> import dbm
>>> dbm.open('database2')
<_dbm.dbm object at 0x7fa7eaefcf50>
So, I suspect a regression on the dbm
module, but I haven't checked the documentation. It may be intended :-?
NB: Notice that in my case, the extension is .db
, but that depends on the database being used by dbm
by default! Create an empty shelf using Python 3 to figure out which one are you using and what is it expecting.
I don't think it's possible to use a Python 2 shelf with Python 3's shelve
module. The underlying files are completely different, at least in my tests.
In Python 2*, a shelf is represented as a single file with the filename you originally gave it.
In Python 3*, a shelf consists of three files: filename.bak
, filename.dat
, and filename.dir
. Without any of these files present, the shelf cannot be opened by the Python 3 library (though it appears that just the .dat
file is sufficient for opening, if not actual reading).
@Ricardo Cárdenes has given an overview of why this may be--it's likely an issue with the underlying database modules used in storing the shelved data. It's possible that the databases are backwards compatible, but I don't know and a quick search hasn't turned up any obvious answers.
I think it's likely that some of the possible databases implemented by dbm
are backwards-compatible, whereas others are not: this could be the cause of the discrepancy between answers here, where some people, but not all, are able to open older databases directly by specifying a protocol.
*On every machine I've tested, using Python 2.7.6 vs Pythons 3.2.5, 3.3.4, and 3.4.1
The shelve
module uses Python's pickle, which may require a protocol version when being accessed between different versions of Python.
Try supplying protocol version 2:
population = shelve.open('shelved.shelf', protocol=2)
According to the documentation:
Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
This is most likely the protocol used in the original serialization (or pickling).
As I understand now, here is the path that lead to my problem:
I at first looked into installing a third party bsddb module for Python 3, but it quickly started to turn into a hassle. It then seemed that it would be a recurring hassle any time I need to use the same shelf file on a new machine. So I decided to convert the file from bsddb to dumbdbm, which both my python 2 and python 3 installations can read.
I ran the following in Python 2, which is the version that contains both bsddb and dumbdbm:
import shelve
import dumbdbm
def dumbdbm_shelve(filename,flag="c"):
return shelve.Shelf(dumbdbm.open(filename,flag))
out_shelf=dumbdbm_shelve("shelved.dumbdbm.shelf")
in_shelf=shelve.open("shelved.shelf")
key_list=in_shelf.keys()
for key in key_list:
out_shelf[key]=in_shelf[key]
out_shelf.close()
in_shelf.close()
So far it looks like the dumbdbm.shelf files came out ok, pending a double-check of the contents.