问题
This is a follow up to this stackoverflow question
Column missing when trying to open hdf created by pandas in h5py
Where I am trying to create save a large amount of data onto a disk (too large to fit into memory), and retrieve sepecific rows of the data using indices.
One of the solutions given in the linked post is to create a seperate key for every every row.
At the moment I can only think of iterating through each row, and setting the keys directly.
For example, if this is my data
IndexID Ids
1899317 [0, 47715, 1757, 9, 38994, 230, 12, 241, 12228...
22861131 [0, 48156, 154, 6304, 43611, 11, 9496, 8982, 1...
2163410 [0, 26039, 41156, 227, 860, 3320, 6673, 260, 1...
15760716 [0, 40883, 4086, 11, 5, 18559, 1923, 1494, 4, ...
12244098 [0, 45651, 4128, 227, 5, 10397, 995, 731, 9, 3...
I can go throw say my dataframe and set each row like this
f.create_dataset(str(row['IndexID']), data=row['Ids'])
I am wondering if there is a batch way to do this.
来源:https://stackoverflow.com/questions/61706898/most-efficient-way-of-saving-a-pandas-dataframe-or-2d-numpy-array-into-h5py-wit