pytables

merging several hdf5 files into one pytable

左心房为你撑大大i 提交于 2019-12-05 22:16:51
I have several hdf5 files, each of them with the same structure. I'd like to create one pytable out of them by somehow merging the hdf5 files. What I mean is that if an array in file1 has size x and array in file2 has size y, the resulting array in the pytable will be of size x+y, containing first all the entries from file1 and then all the entries from file2. How you want to do this depends slightly on the data type that you have. Arrays and CArrays have a static size so you need to preallocate the data space. Thus you would do something like the following: import tables as tb file1 = tb.open

How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?

懵懂的女人 提交于 2019-12-05 16:13:19
I have the following pandas dataframe: import pandas as pd df = pd.read_csv(filename.csv) Now, I can use HDFStore to write the df object to file (like adding key-value pairs to a Python dictionary): store = HDFStore('store.h5') store['df'] = df http://pandas.pydata.org/pandas-docs/stable/io.html When I look at the contents, this object is a frame . store outputs <class 'pandas.io.pytables.HDFStore'> File path: store.h5 /df frame (shape->[552,23252]) However, in order to use indexing, one should store this as a table object. My approach was to try HDFStore.put() i.e. HDFStore.put(key="store.h",

How to effiiciently rebuild pandas hdfstore table when append fails

眉间皱痕 提交于 2019-12-05 15:09:11
I am working on using the hdfstore in pandas to data frames from an ongoing iterative process. At each iteration, I append to a table in the hdfstore. Here is a toy example: import pandas as pd from pandas import HDFStore import numpy as np from random import choice from string import ascii_letters alphanum=np.array(list(ascii_letters)+range(0,9)) def hdfstore_append(storefile,key,df,format="t",columns=None,data_columns=None): if df is None: return if key[0]!='/': key='/'+key with HDFStore(storefile) as store: if key not in store.keys(): store.put(key,df,format=format,columns=columns,data

Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

▼魔方 西西 提交于 2019-12-05 13:10:45
I am trying to create a function that updates a pandas DataFrame stored that I have stored in a PyTable with new data from a pandas DataFrame. I want to check if some data is missing in the PyTable for specific DatetimeIndexes (value is NaN or a new Timestamp is available), replace this with new values from a given pandas DataFrame and append this to the Pytable. Basically, just update a Pytable. I can get the combined DataFrame using the combine_first method in Pandas. Below the Pytable is created with dummy data: import pandas as pd import numpy as np import datetime as dt index = pd

Query HDF5 in Pandas

孤街浪徒 提交于 2019-12-04 18:31:49
问题 I have following data (18,619,211 rows) stored as a pandas dataframe object in hdf5 file: date id2 w id 100010 1980-03-31 10401 0.000839 100010 1980-03-31 10604 0.020140 100010 1980-03-31 12490 0.026149 100010 1980-03-31 13047 0.033560 100010 1980-03-31 13303 0.001657 where id is index and others are columns. date is np.datetime64 . I need to perform query like this (the code doesn't work of course): db=pd.HDFStore('database.h5') data=db.select('df', where='id==id_i & date>bgdt & date<endt')

How do I read/write to a subgroup withing a HDF5Store?

末鹿安然 提交于 2019-12-04 18:23:13
I am using the HDF5Store, to store some of my processed results, prior to analysis. Into the store I want to put 3 types of results, Raw results, that have not been processed at all, just read-in and merged from their original CSV formats Processed results that are derived from the raw results, that have some proccessing and division into more logical groupings Summarised results that have useful summery columns added and redundant columns removed, for easy reading. I thought a HDF5Store with Hierarchical Keys would do it, one for Raw, one for Processed and one for Summarised. I wanted a

Saving dictionaries to file (numpy and Python 2/3 friendly)

感情迁移 提交于 2019-12-04 18:07:28
问题 I want to do hierarchical key-value storage in Python, which basically boils down to storing dictionaries to files. By that I mean any type of dictionary structure, that may contain other dictionaries, numpy arrays, serializable Python objects, and so forth. Not only that, I want it to store numpy arrays space-optimized and play nice between Python 2 and 3. Below are methods I know are out there. My question is what is missing from this list and is there an alternative that dodges all my deal

HDFStore with string columns gives issues

家住魔仙堡 提交于 2019-12-04 12:00:40
问题 I have a pandas DataFrame myDF with a few string columns (whose dtype is object ) and many numeric columns. I tried the following: d=pandas.HDFStore("C:\\PF\\Temp.h5") d['test']=myDF I got this result: C:\PF\WinPython-64bit-3.3.3.3\python-3.3.3.amd64\lib\site-packages\pandas\io\pytables.py:2446: PerformanceWarning: your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed,key->block2_values] [items->[0, 1, 3, 4, 5, 6, 9, 10,

Using pytables, which is more efficient: scipy.sparse or numpy dense matrix?

混江龙づ霸主 提交于 2019-12-04 11:55:47
问题 When using pytables , there's no support (as far as I can tell) for the scipy.sparse matrix formats, so to store a matrix I have to do some conversion, e.g. def store_sparse_matrix(self): grp1 = self.getFileHandle().createGroup(self.getGroup(), 'M') self.getFileHandle().createArray(grp1, 'data', M.tocsr().data) self.getFileHandle().createArray(grp1, 'indptr', M.tocsr().indptr) self.getFileHandle().createArray(grp1, 'indices', M.tocsr().indices) def get_sparse_matrix(self): return sparse.csr

Pandas _metadata of DataFrame persistence error

随声附和 提交于 2019-12-04 08:10:00
I have finally figured out how to use _metadata from a DataFrame, everything works except I am unable to persist it such as to hdf5 or json. I know it works because I copy the frame and _metadata attributes copy over "non _metadata" attributes don't. example df = pandas.DataFrame #make up a frame to your liking pandas.DataFrame._metadata = ["testmeta"] df.testmeta = "testmetaval" df.badmeta = "badmetaval" newframe = df.copy() newframe.testmeta -->outputs "testmetaval" newframe.badmeta ---> raises attribute error #json test df.to_json(Path) revivedjsonframe = pandas.io.json.read_json(Path)