问题
I've been saving a bunch of dictionaries to file using Python's shelve
module (with Python 3.4 on OSX 10.9.5). Each key
is a string of an int (e.g., "84554"
), and each value
is a dictionary of dictionaries of a few smallish strings.
No keys are used twice, and I know the total superset of all possible keys. I am adding these key-value pairs to the shelf
via threads and which keys/values are added changes each time I run it (which is expected).
The problem I've been having is that the number of keys iterable/visible with shelve
's shelf.keys()
and the number of unique keys for which key in shelf.keys()
are different.
Here's my code. I first initialize things and load ids
, which is the list of all possible keys.
import shelve
from custom_code import *
MAIN_PATH = "/Users/myname/project_path/src/"
ids = list(set(load_list(MAIN_PATH + "id_list.pkl")))
c = c2 = 0
good_keys = []
bad_keys = []
I then open the shelf, counting all the number of keys that I iterate through with db.keys()
, adding the "good" keys to a list.
db = shelve.open(MAIN_PATH + "first_3")
for k in db.keys():
c2+=1
good_keys+=[k]
Then, I check each possible key to see if it's in the shelf, checking to see if it exists in the shelf, and doing the same thing as above.
for j in set(ids):
if j in db.keys():
c+=1
bad_keys+=[j]
The two counters, c
and c2
, should be the same, but doing:
print("With `db.keys()`: {0}, with verifying from the list: {1}".format(c2, c))
yields:
With `db.keys()`: 628, with verifying from the list: 669
I then look at keys that were in bad_keys
but not good_keys
(i.e., collected from db.keys()
) and pick an example.
odd_men_out = list( set(bad_keys).difference( set(good_keys) ) )
bad_key = odd_men_out[0]
print(bad_key) # '84554'
I then check the following:
print(bad_key in db.keys()) # True
print(bad_key in db) # True
print(db[bad_key]) # A dictionary of dictionaries that wraps ~12ish lines
print(bad_key in list(db.keys())) # False
Note that last check. Does anybody know what gives? I thought shelves
was supposed to be easy, but it's been giving me complete hell.
Perhaps unrelatedly (but perhaps not), when I let an even greater number of entries accumulate in the shelf and try to do something like for k in db.keys()
or list(db.keys())
, I get the following error:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/_collections_abc.py", line 482, in __iter__
yield from self._mapping
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/shelve.py", line 95, in __iter__
for k in self.dict.keys():
SystemError: Negative size passed to PyBytes_FromStringAndSize
But can still access the data by trying all possible keys. Evidently that's because I'm not using gdbm
?
来源:https://stackoverflow.com/questions/49595029/python-shelve-having-items-that-arent-listed