pytables | 易学教程

merging several hdf5 files into one pytable

阅读更多关于 merging several hdf5 files into one pytable

I have several hdf5 files, each of them with the same structure. I'd like to create one pytable out of them by somehow merging the hdf5 files. What I mean is that if an array in file1 has size x and array in file2 has size y, the resulting array in the pytable will be of size x+y, containing first all the entries from file1 and then all the entries from file2. How you want to do this depends slightly on the data type that you have. Arrays and CArrays have a static size so you need to preallocate the data space. Thus you would do something like the following: import tables as tb file1 = tb.open

How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?

阅读更多关于 How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?

I have the following pandas dataframe: import pandas as pd df = pd.read_csv(filename.csv) Now, I can use HDFStore to write the df object to file (like adding key-value pairs to a Python dictionary): store = HDFStore('store.h5') store['df'] = df http://pandas.pydata.org/pandas-docs/stable/io.html When I look at the contents, this object is a frame . store outputs <class 'pandas.io.pytables.HDFStore'> File path: store.h5 /df frame (shape->[552,23252]) However, in order to use indexing, one should store this as a table object. My approach was to try HDFStore.put() i.e. HDFStore.put(key="store.h",

How to effiiciently rebuild pandas hdfstore table when append fails

阅读更多关于 How to effiiciently rebuild pandas hdfstore table when append fails

I am working on using the hdfstore in pandas to data frames from an ongoing iterative process. At each iteration, I append to a table in the hdfstore. Here is a toy example: import pandas as pd from pandas import HDFStore import numpy as np from random import choice from string import ascii_letters alphanum=np.array(list(ascii_letters)+range(0,9)) def hdfstore_append(storefile,key,df,format="t",columns=None,data_columns=None): if df is None: return if key[0]!='/': key='/'+key with HDFStore(storefile) as store: if key not in store.keys(): store.put(key,df,format=format,columns=columns,data

Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

阅读更多关于 Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

I am trying to create a function that updates a pandas DataFrame stored that I have stored in a PyTable with new data from a pandas DataFrame. I want to check if some data is missing in the PyTable for specific DatetimeIndexes (value is NaN or a new Timestamp is available), replace this with new values from a given pandas DataFrame and append this to the Pytable. Basically, just update a Pytable. I can get the combined DataFrame using the combine_first method in Pandas. Below the Pytable is created with dummy data: import pandas as pd import numpy as np import datetime as dt index = pd

Query HDF5 in Pandas

阅读更多关于 Query HDF5 in Pandas

问题 I have following data (18,619,211 rows) stored as a pandas dataframe object in hdf5 file: date id2 w id 100010 1980-03-31 10401 0.000839 100010 1980-03-31 10604 0.020140 100010 1980-03-31 12490 0.026149 100010 1980-03-31 13047 0.033560 100010 1980-03-31 13303 0.001657 where id is index and others are columns. date is np.datetime64 . I need to perform query like this (the code doesn't work of course): db=pd.HDFStore('database.h5') data=db.select('df', where='id==id_i & date>bgdt & date<endt')

How do I read/write to a subgroup withing a HDF5Store?

阅读更多关于 How do I read/write to a subgroup withing a HDF5Store?

I am using the HDF5Store, to store some of my processed results, prior to analysis. Into the store I want to put 3 types of results, Raw results, that have not been processed at all, just read-in and merged from their original CSV formats Processed results that are derived from the raw results, that have some proccessing and division into more logical groupings Summarised results that have useful summery columns added and redundant columns removed, for easy reading. I thought a HDF5Store with Hierarchical Keys would do it, one for Raw, one for Processed and one for Summarised. I wanted a

Saving dictionaries to file (numpy and Python 2/3 friendly)

阅读更多关于 Saving dictionaries to file (numpy and Python 2/3 friendly)

问题 I want to do hierarchical key-value storage in Python, which basically boils down to storing dictionaries to files. By that I mean any type of dictionary structure, that may contain other dictionaries, numpy arrays, serializable Python objects, and so forth. Not only that, I want it to store numpy arrays space-optimized and play nice between Python 2 and 3. Below are methods I know are out there. My question is what is missing from this list and is there an alternative that dodges all my deal

HDFStore with string columns gives issues

阅读更多关于 HDFStore with string columns gives issues

问题 I have a pandas DataFrame myDF with a few string columns (whose dtype is object ) and many numeric columns. I tried the following: d=pandas.HDFStore("C:\\PF\\Temp.h5") d['test']=myDF I got this result: C:\PF\WinPython-64bit-3.3.3.3\python-3.3.3.amd64\lib\site-packages\pandas\io\pytables.py:2446: PerformanceWarning: your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed,key->block2_values] [items->[0, 1, 3, 4, 5, 6, 9, 10,

Using pytables, which is more efficient: scipy.sparse or numpy dense matrix?

阅读更多关于 Using pytables, which is more efficient: scipy.sparse or numpy dense matrix?

问题 When using pytables , there's no support (as far as I can tell) for the scipy.sparse matrix formats, so to store a matrix I have to do some conversion, e.g. def store_sparse_matrix(self): grp1 = self.getFileHandle().createGroup(self.getGroup(), 'M') self.getFileHandle().createArray(grp1, 'data', M.tocsr().data) self.getFileHandle().createArray(grp1, 'indptr', M.tocsr().indptr) self.getFileHandle().createArray(grp1, 'indices', M.tocsr().indices) def get_sparse_matrix(self): return sparse.csr

Pandas _metadata of DataFrame persistence error

阅读更多关于 Pandas _metadata of DataFrame persistence error

I have finally figured out how to use _metadata from a DataFrame, everything works except I am unable to persist it such as to hdf5 or json. I know it works because I copy the frame and _metadata attributes copy over "non _metadata" attributes don't. example df = pandas.DataFrame #make up a frame to your liking pandas.DataFrame._metadata = ["testmeta"] df.testmeta = "testmetaval" df.badmeta = "badmetaval" newframe = df.copy() newframe.testmeta -->outputs "testmetaval" newframe.badmeta ---> raises attribute error #json test df.to_json(Path) revivedjsonframe = pandas.io.json.read_json(Path)