pytables | 易学教程

Opening a corrupted PyTables HDF5 file

阅读更多关于 Opening a corrupted PyTables HDF5 file

问题 I am hoping for some help in opening a corrupted HDF5 file. I am accessing PyTables via Pandas , but a pd.read_hdf() call produces the following error. I don't know anything about the inner workings of PyTables . I believe the error was created because the process saving to the file (appending every 10 seconds or so) got duplicated, so there were then 2 identical processes appending. I am not sure why this would corrupt the file rather than duplicate data, but the two errors occurred together

Filter HDF dataset from H5 file using attribute

阅读更多关于 Filter HDF dataset from H5 file using attribute

问题 I have an h5 file containing multiple groups and datasets. Each dataset has associated attributes. I want to find/filter the datasets in this h5 file based upon the respective attribute associated with it. Example: dataset1 =cloudy(attribute) dataset2 =rainy(attribute) dataset3 =cloudy(attribute) I want to find the datasets having weather attribute/metadata as cloudy What will be the simplest approach to get this done in pythonic way. 回答1: There are 2 ways to access HDF5 data with Python:

combining huge h5 files with multiple datasets into one with odo

阅读更多关于 combining huge h5 files with multiple datasets into one with odo

问题 I have a a number of large (13GB+ in size) h5 files, each h5 file has two datasets that were created with pandas: df.to_hdf('name_of_file_to_save', 'key_1',table=True) df.to_hdf('name_of_file_to_save', 'key_2', table=True) # saved to the same h5 file as above I've seen a post here: Concatenate two big pandas.HDFStore HDF5 files on using odo to concatenate h5 files. What I want to do is for each h5 file that was created, each having key_1 and key_2 , combine them so that all of the key_1 data

TypeError: read_hdf() takes exactly 2 arguments (1 given)

阅读更多关于 TypeError: read_hdf() takes exactly 2 arguments (1 given)

问题 How to open a HDF5 file with pandas.read_hdf when the keys are not known? from pandas.io.pytables import read_hdf read_hdf(path_or_buf, key) pandas.__version__ == '0.14.1' Here the key parameter is not known. Thanks 回答1: Having never worked with hdf files before I was able to use the online docs to cook an example: In [59]: # create a temp df and store it df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5)))) df_tl.to_hdf('store_tl.h5','table',append=True) In [60]: # we can simply

Pandas _metadata of DataFrame persistence error

阅读更多关于 Pandas _metadata of DataFrame persistence error

问题 I have finally figured out how to use _metadata from a DataFrame, everything works except I am unable to persist it such as to hdf5 or json. I know it works because I copy the frame and _metadata attributes copy over "non _metadata" attributes don't. example df = pandas.DataFrame #make up a frame to your liking pandas.DataFrame._metadata = ["testmeta"] df.testmeta = "testmetaval" df.badmeta = "badmetaval" newframe = df.copy() newframe.testmeta -->outputs "testmetaval" newframe.badmeta --->

finding a duplicate in a hdf5 pytable with 500e6 rows

阅读更多关于 finding a duplicate in a hdf5 pytable with 500e6 rows

问题 Problem I have a large (> 500e6 rows) dataset that I've put into a pytables database. Lets say first column is ID, second column is counter for each ID. each ID-counter combination has to be unique. I have one non-unique row amongst 500e6 rows I'm trying to find. As a starter I've done something like this: index1 = db.cols.id.create_index() index2 = db.cols.counts.create_index() for row in db: query = '(id == %d) & (counts == %d)' % (row['id'], row['counts']) result = th.readWhere(query) if

PyTables read random subset

阅读更多关于 PyTables read random subset

问题 Is it possible to read a random subset of rows from HDF5 (via pyTables or, preferably pandas)? I have a very large dataset with million of rows, but only need a sample of few thousands for analysis. And what about reading from compressed HDF file? 回答1: Using HDFStore docs are here, compression docs are here Random access via a constructed index is supported in 0.13 In [26]: df = DataFrame(np.random.randn(100,2),columns=['A','B']) In [27]: df.to_hdf('test.h5','df',mode='w',format='table') In

HDFStore.append(string, DataFrame) fails when string column contents are longer than those already there

阅读更多关于 HDFStore.append(string, DataFrame) fails when string column contents are longer than those already there

问题 I have a Pandas DataFrame stored via an HDFStore that essentially stores summary rows about test runs I am doing. Several of the fields in each row contain descriptive strings of variable length. When I do a test run, I create a new DataFrame with a single row in it: def export_as_df(self): return pd.DataFrame(data=[self._to_dict()], index=[datetime.datetime.now()]) And then call HDFStore.append(string, DataFrame) to add the new row to the existing DataFrame. This works fine, apart from where

What is a better approach of storing and querying a big dataset of meteorological data

阅读更多关于 What is a better approach of storing and querying a big dataset of meteorological data

问题 I am looking for a convenient way to store and to query huge amount of meteorological data (few TB). More information about the type of data in the middle of the question. Previously I was looking in the direction of MongoDB (I was using it for many of my own previous projects and feel comfortable dealing with it), but recently I found out about HDF5 data format. Reading about it, I found some similarities with Mongo: HDF5 simplifies the file structure to include only two major types of

Is there a way to get a numpy-style view to a slice of an array stored in a hdf5 file?

阅读更多关于 Is there a way to get a numpy-style view to a slice of an array stored in a hdf5 file?

问题 I have to work on large 3D cubes of data. I want to store them in HDF5 files (using h5py or maybe pytables). I often want to perform analysis on just a section of these cubes. This section is too large to hold in memory. I would like to have a numpy style view to my slice of interest, without copying the data to memory (similar to what you could do with a numpy memmap). Is this possible? As far as I know, performing a slice using h5py, you get a numpy array in memory. It has been asked why I