pytables

Opening a corrupted PyTables HDF5 file

寵の児 提交于 2019-12-11 02:37:27
问题 I am hoping for some help in opening a corrupted HDF5 file. I am accessing PyTables via Pandas , but a pd.read_hdf() call produces the following error. I don't know anything about the inner workings of PyTables . I believe the error was created because the process saving to the file (appending every 10 seconds or so) got duplicated, so there were then 2 identical processes appending. I am not sure why this would corrupt the file rather than duplicate data, but the two errors occurred together

Filter HDF dataset from H5 file using attribute

旧城冷巷雨未停 提交于 2019-12-10 21:17:53
问题 I have an h5 file containing multiple groups and datasets. Each dataset has associated attributes. I want to find/filter the datasets in this h5 file based upon the respective attribute associated with it. Example: dataset1 =cloudy(attribute) dataset2 =rainy(attribute) dataset3 =cloudy(attribute) I want to find the datasets having weather attribute/metadata as cloudy What will be the simplest approach to get this done in pythonic way. 回答1: There are 2 ways to access HDF5 data with Python:

combining huge h5 files with multiple datasets into one with odo

瘦欲@ 提交于 2019-12-10 20:17:42
问题 I have a a number of large (13GB+ in size) h5 files, each h5 file has two datasets that were created with pandas: df.to_hdf('name_of_file_to_save', 'key_1',table=True) df.to_hdf('name_of_file_to_save', 'key_2', table=True) # saved to the same h5 file as above I've seen a post here: Concatenate two big pandas.HDFStore HDF5 files on using odo to concatenate h5 files. What I want to do is for each h5 file that was created, each having key_1 and key_2 , combine them so that all of the key_1 data

TypeError: read_hdf() takes exactly 2 arguments (1 given)

一个人想着一个人 提交于 2019-12-10 20:08:03
问题 How to open a HDF5 file with pandas.read_hdf when the keys are not known? from pandas.io.pytables import read_hdf read_hdf(path_or_buf, key) pandas.__version__ == '0.14.1' Here the key parameter is not known. Thanks 回答1: Having never worked with hdf files before I was able to use the online docs to cook an example: In [59]: # create a temp df and store it df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5)))) df_tl.to_hdf('store_tl.h5','table',append=True) In [60]: # we can simply

Pandas _metadata of DataFrame persistence error

那年仲夏 提交于 2019-12-09 18:20:10
问题 I have finally figured out how to use _metadata from a DataFrame, everything works except I am unable to persist it such as to hdf5 or json. I know it works because I copy the frame and _metadata attributes copy over "non _metadata" attributes don't. example df = pandas.DataFrame #make up a frame to your liking pandas.DataFrame._metadata = ["testmeta"] df.testmeta = "testmetaval" df.badmeta = "badmetaval" newframe = df.copy() newframe.testmeta -->outputs "testmetaval" newframe.badmeta --->

finding a duplicate in a hdf5 pytable with 500e6 rows

元气小坏坏 提交于 2019-12-09 17:33:18
问题 Problem I have a large (> 500e6 rows) dataset that I've put into a pytables database. Lets say first column is ID, second column is counter for each ID. each ID-counter combination has to be unique. I have one non-unique row amongst 500e6 rows I'm trying to find. As a starter I've done something like this: index1 = db.cols.id.create_index() index2 = db.cols.counts.create_index() for row in db: query = '(id == %d) & (counts == %d)' % (row['id'], row['counts']) result = th.readWhere(query) if

PyTables read random subset

假装没事ソ 提交于 2019-12-09 11:20:37
问题 Is it possible to read a random subset of rows from HDF5 (via pyTables or, preferably pandas)? I have a very large dataset with million of rows, but only need a sample of few thousands for analysis. And what about reading from compressed HDF file? 回答1: Using HDFStore docs are here, compression docs are here Random access via a constructed index is supported in 0.13 In [26]: df = DataFrame(np.random.randn(100,2),columns=['A','B']) In [27]: df.to_hdf('test.h5','df',mode='w',format='table') In

HDFStore.append(string, DataFrame) fails when string column contents are longer than those already there

。_饼干妹妹 提交于 2019-12-09 09:13:25
问题 I have a Pandas DataFrame stored via an HDFStore that essentially stores summary rows about test runs I am doing. Several of the fields in each row contain descriptive strings of variable length. When I do a test run, I create a new DataFrame with a single row in it: def export_as_df(self): return pd.DataFrame(data=[self._to_dict()], index=[datetime.datetime.now()]) And then call HDFStore.append(string, DataFrame) to add the new row to the existing DataFrame. This works fine, apart from where

What is a better approach of storing and querying a big dataset of meteorological data

烂漫一生 提交于 2019-12-09 06:47:43
问题 I am looking for a convenient way to store and to query huge amount of meteorological data (few TB). More information about the type of data in the middle of the question. Previously I was looking in the direction of MongoDB (I was using it for many of my own previous projects and feel comfortable dealing with it), but recently I found out about HDF5 data format. Reading about it, I found some similarities with Mongo: HDF5 simplifies the file structure to include only two major types of

Is there a way to get a numpy-style view to a slice of an array stored in a hdf5 file?

﹥>﹥吖頭↗ 提交于 2019-12-08 15:47:00
问题 I have to work on large 3D cubes of data. I want to store them in HDF5 files (using h5py or maybe pytables). I often want to perform analysis on just a section of these cubes. This section is too large to hold in memory. I would like to have a numpy style view to my slice of interest, without copying the data to memory (similar to what you could do with a numpy memmap). Is this possible? As far as I know, performing a slice using h5py, you get a numpy array in memory. It has been asked why I