pytables

Numpy efficient big matrix multiplication

限于喜欢 提交于 2019-12-31 10:50:08
问题 To store big matrix on disk I use numpy.memmap. Here is a sample code to test big matrix multiplication: import numpy as np import time rows= 10000 # it can be large for example 1kk cols= 1000 #create some data in memory data = np.arange(rows*cols, dtype='float32') data.resize((rows,cols)) #create file on disk fp0 = np.memmap('C:/data_0', dtype='float32', mode='w+', shape=(rows,cols)) fp1 = np.memmap('C:/data_1', dtype='float32', mode='w+', shape=(rows,cols)) fp0[:]=data[:] fp1[:]=data[:]

Cannot retrieve Datasets in PyTables using natural naming

半腔热情 提交于 2019-12-24 11:31:57
问题 I'm new in PyTables and I want to retrieve a dataset from a HDF5 using natural naming but I'm getting this error using this input: f = tables.open_file("filename.h5", "r") f.root.group-1.dataset-1.read() group / does not have a child named group and if I try: f.root.group\-1.dataset\-1.read() group / does not have a child named group unexpected character after line continuation character I can't change names in the groups because is big data from an experiment. 回答1: You can't use the minus

Pytables 2.3.1 with Python 2.5 on Windows: Error - could not find a local hdf5 installation

三世轮回 提交于 2019-12-24 08:35:06
问题 I'm trying to install PyTables 2.3.1 on Windows XP with Python 2.5. I'm getting the following error: Could not find a local HDF5 installation. You may need to explicitly state where your local HDF5 headers and library can be found by setting the HDF5_DIR environment variable or by using the --hdf5 command-line option. I'm a bit confused by the installation of the HDF5 library. I downloaded the Windows binary called HDF5188-win32-shared.zip from the HDF5 site and ran the .exe file in the zip

how to access index in a pandas HDStore (pyTables)

谁说我不能喝 提交于 2019-12-24 00:51:16
问题 I have a large HDFStore with a multi-index. How can I get a hold of one of the index levels? I see I can access the colindex like so: store._handle.root.data.table.colindexes But have yet to get a list of one of the column indexes. 回答1: The multi-index example from the docs In [21]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['foo', 'bar']) In [22]: df_mi = DataFrame(np

How to specify min_itemsize for an index column

≯℡__Kan透↙ 提交于 2019-12-23 20:24:44
问题 I am unable to specify the minimum size for the index in a to_hdf append operation. Min_itemsize works for the data columns, so how can I get it to work for the index column? This code: from pandas import * df = DataFrame(['1','2'],index=['a','b']) df.index.name = 'symbol' df.to_hdf("store.h5",'df',append = True,format='table',min_itemsize = { 'symbol' : 10} ) generates this error message: ValueError: min_itemsize has the key [symbol] which is not an axis or data_column 来源: https:/

GIL for IO bounded thread in C extension (HDF5)

倖福魔咒の 提交于 2019-12-23 19:00:57
问题 I have a sampling application that acquires 250,000 samples per second, buffers them in memory and eventually appends to an HDFStore provided by pandas . In general, this is great. However, I have a thread that runs and continually empties the data acquisition device ( DAQ ) and it needs to run on a somewhat regular basis. A deviation of about a second tends to break things. Below is an extreme case of the timings observed. Start indicates a DAQ read starting, Finish is when it finishes, and

What is the equivalent of “select max(column) from table” in pytables

天大地大妈咪最大 提交于 2019-12-22 14:03:29
问题 I have a table with a whole lot of numerical values in it, i know i could extract the column and do a max() on it, but there probably is a way to do this using the in-kernel method. Just cant seem to find it though. 回答1: In the test I've made, you can achieve over twice faster results using the iterrows method instead of where: In [117]: timeit max(row['timestamp'] for row in table.iterrows(stop=1000000)) 1 loops, best of 3: 1 s per loop In [118]: timeit max(row['timestamp'] for row in table

How to effiiciently rebuild pandas hdfstore table when append fails

自作多情 提交于 2019-12-22 08:53:16
问题 I am working on using the hdfstore in pandas to data frames from an ongoing iterative process. At each iteration, I append to a table in the hdfstore. Here is a toy example: import pandas as pd from pandas import HDFStore import numpy as np from random import choice from string import ascii_letters alphanum=np.array(list(ascii_letters)+range(0,9)) def hdfstore_append(storefile,key,df,format="t",columns=None,data_columns=None): if df is None: return if key[0]!='/': key='/'+key with HDFStore

Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

流过昼夜 提交于 2019-12-22 06:37:40
问题 I am trying to create a function that updates a pandas DataFrame stored that I have stored in a PyTable with new data from a pandas DataFrame. I want to check if some data is missing in the PyTable for specific DatetimeIndexes (value is NaN or a new Timestamp is available), replace this with new values from a given pandas DataFrame and append this to the Pytable. Basically, just update a Pytable. I can get the combined DataFrame using the combine_first method in Pandas. Below the Pytable is

How do I read/write to a subgroup withing a HDF5Store?

断了今生、忘了曾经 提交于 2019-12-22 00:24:31
问题 I am using the HDF5Store, to store some of my processed results, prior to analysis. Into the store I want to put 3 types of results, Raw results, that have not been processed at all, just read-in and merged from their original CSV formats Processed results that are derived from the raw results, that have some proccessing and division into more logical groupings Summarised results that have useful summery columns added and redundant columns removed, for easy reading. I thought a HDF5Store with