问题
Now I have a dict object, where the key is a unique hashed id and the value is a length > 100 sparse list. I'd like to store this in plain text(e.g., csv/tsv/whatever that is not pickle.dump). Is there any good way to store this kind of sparse list? For example:
d = {"a": [0,0,0, ..., 1,0], "b": [0.5,0,0, ...,0.5,0], "c":...}
The length of each list is exactly the same. I was thinking whether it's worth storing this kind of sparse list as index-value pair. But I'm not sure whether there is any package do this.
回答1:
import numpy as np
from scipy.sparse import csr_matrix,lil_matrix,save_npz,load_npz
a = {'a':[0,0,1,0],'b':[1,0,0,0],'c':[1,1,0,0]}
sparse1 = csr_matrix(np.array(a.values())) ## You can use lil_matirx as well
print sparse1
print sparse1.toarray()
save_npz('values.npz',sparse1)
np.save('keys.npy',np.array(a.keys()))
sparse3 = load_npz('values.npz')
print sparse3
print sparse3.toarray()
keys = np.load('keys.npy')
print keys
print dict(zip(keys,sparse3))
回答2:
Rather than saving the 0s, you should transform the sparse list into a dictionary of the non-zero values. For example,
{'a':[0,0,0,1,0,0,0,2,0,0,0,3]}
could become
{'a':{3:1, 6:2, 9:3}}
You could transform the lists easily enough with a dictionary comprehension:
compressed_data = {
hashed_id: {
index: value for index, value in enumerate(values) if value != 0
} for hashed_id, values in original_data.items()
}
Then you could just save that dictionary to a file. After you load the compressed list from the file:
decompressed_data = {}
for hashed_id, values in loaded_data.items():
decompressed_values = [0] * DATA_LENGTH
for index, value in values.items():
decompressed_values[index] = value
decompressed_data[hashed_id] = decompressed_values
来源:https://stackoverflow.com/questions/46756719/how-to-store-a-sparse-list-in-python