How to store a sparse list in python?

问题

Now I have a dict object, where the key is a unique hashed id and the value is a length > 100 sparse list. I'd like to store this in plain text(e.g., csv/tsv/whatever that is not pickle.dump). Is there any good way to store this kind of sparse list? For example:

d = {"a": [0,0,0, ..., 1,0], "b": [0.5,0,0, ...,0.5,0], "c":...}

The length of each list is exactly the same. I was thinking whether it's worth storing this kind of sparse list as index-value pair. But I'm not sure whether there is any package do this.

回答1:

import numpy as np   
from scipy.sparse import csr_matrix,lil_matrix,save_npz,load_npz
a = {'a':[0,0,1,0],'b':[1,0,0,0],'c':[1,1,0,0]}
sparse1 = csr_matrix(np.array(a.values())) ## You can use lil_matirx as well
print sparse1
print sparse1.toarray()
save_npz('values.npz',sparse1)
np.save('keys.npy',np.array(a.keys()))
sparse3 = load_npz('values.npz')
print sparse3
print sparse3.toarray()
keys = np.load('keys.npy')
print keys

print dict(zip(keys,sparse3))

回答2:

Rather than saving the 0s, you should transform the sparse list into a dictionary of the non-zero values. For example,

{'a':[0,0,0,1,0,0,0,2,0,0,0,3]}

could become

{'a':{3:1, 6:2, 9:3}}

You could transform the lists easily enough with a dictionary comprehension:

compressed_data = {
    hashed_id: {
        index: value for index, value in enumerate(values) if value != 0
    } for hashed_id, values in original_data.items()
}

Then you could just save that dictionary to a file. After you load the compressed list from the file:

decompressed_data = {}
for hashed_id, values in loaded_data.items():
    decompressed_values = [0] * DATA_LENGTH
    for index, value in values.items():
        decompressed_values[index] = value
    decompressed_data[hashed_id] = decompressed_values

来源：https://stackoverflow.com/questions/46756719/how-to-store-a-sparse-list-in-python

标签

python