pytables | 易学教程

Numpy efficient big matrix multiplication

阅读更多关于 Numpy efficient big matrix multiplication

问题 To store big matrix on disk I use numpy.memmap. Here is a sample code to test big matrix multiplication: import numpy as np import time rows= 10000 # it can be large for example 1kk cols= 1000 #create some data in memory data = np.arange(rows*cols, dtype='float32') data.resize((rows,cols)) #create file on disk fp0 = np.memmap('C:/data_0', dtype='float32', mode='w+', shape=(rows,cols)) fp1 = np.memmap('C:/data_1', dtype='float32', mode='w+', shape=(rows,cols)) fp0[:]=data[:] fp1[:]=data[:]

Cannot retrieve Datasets in PyTables using natural naming

阅读更多关于 Cannot retrieve Datasets in PyTables using natural naming

问题 I'm new in PyTables and I want to retrieve a dataset from a HDF5 using natural naming but I'm getting this error using this input: f = tables.open_file("filename.h5", "r") f.root.group-1.dataset-1.read() group / does not have a child named group and if I try: f.root.group\-1.dataset\-1.read() group / does not have a child named group unexpected character after line continuation character I can't change names in the groups because is big data from an experiment. 回答1: You can't use the minus

Pytables 2.3.1 with Python 2.5 on Windows: Error - could not find a local hdf5 installation

阅读更多关于 Pytables 2.3.1 with Python 2.5 on Windows: Error - could not find a local hdf5 installation

问题 I'm trying to install PyTables 2.3.1 on Windows XP with Python 2.5. I'm getting the following error: Could not find a local HDF5 installation. You may need to explicitly state where your local HDF5 headers and library can be found by setting the HDF5_DIR environment variable or by using the --hdf5 command-line option. I'm a bit confused by the installation of the HDF5 library. I downloaded the Windows binary called HDF5188-win32-shared.zip from the HDF5 site and ran the .exe file in the zip

how to access index in a pandas HDStore (pyTables)

阅读更多关于 how to access index in a pandas HDStore (pyTables)

问题 I have a large HDFStore with a multi-index. How can I get a hold of one of the index levels? I see I can access the colindex like so: store._handle.root.data.table.colindexes But have yet to get a list of one of the column indexes. 回答1: The multi-index example from the docs In [21]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['foo', 'bar']) In [22]: df_mi = DataFrame(np

How to specify min_itemsize for an index column

阅读更多关于 How to specify min_itemsize for an index column

问题 I am unable to specify the minimum size for the index in a to_hdf append operation. Min_itemsize works for the data columns, so how can I get it to work for the index column? This code: from pandas import * df = DataFrame(['1','2'],index=['a','b']) df.index.name = 'symbol' df.to_hdf("store.h5",'df',append = True,format='table',min_itemsize = { 'symbol' : 10} ) generates this error message: ValueError: min_itemsize has the key [symbol] which is not an axis or data_column 来源： https:/

GIL for IO bounded thread in C extension (HDF5)

阅读更多关于 GIL for IO bounded thread in C extension (HDF5)

问题 I have a sampling application that acquires 250,000 samples per second, buffers them in memory and eventually appends to an HDFStore provided by pandas . In general, this is great. However, I have a thread that runs and continually empties the data acquisition device ( DAQ ) and it needs to run on a somewhat regular basis. A deviation of about a second tends to break things. Below is an extreme case of the timings observed. Start indicates a DAQ read starting, Finish is when it finishes, and

What is the equivalent of “select max(column) from table” in pytables

阅读更多关于 What is the equivalent of “select max(column) from table” in pytables

问题 I have a table with a whole lot of numerical values in it, i know i could extract the column and do a max() on it, but there probably is a way to do this using the in-kernel method. Just cant seem to find it though. 回答1: In the test I've made, you can achieve over twice faster results using the iterrows method instead of where: In [117]: timeit max(row['timestamp'] for row in table.iterrows(stop=1000000)) 1 loops, best of 3: 1 s per loop In [118]: timeit max(row['timestamp'] for row in table

How to effiiciently rebuild pandas hdfstore table when append fails

阅读更多关于 How to effiiciently rebuild pandas hdfstore table when append fails

问题 I am working on using the hdfstore in pandas to data frames from an ongoing iterative process. At each iteration, I append to a table in the hdfstore. Here is a toy example: import pandas as pd from pandas import HDFStore import numpy as np from random import choice from string import ascii_letters alphanum=np.array(list(ascii_letters)+range(0,9)) def hdfstore_append(storefile,key,df,format="t",columns=None,data_columns=None): if df is None: return if key[0]!='/': key='/'+key with HDFStore

Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

阅读更多关于 Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

问题 I am trying to create a function that updates a pandas DataFrame stored that I have stored in a PyTable with new data from a pandas DataFrame. I want to check if some data is missing in the PyTable for specific DatetimeIndexes (value is NaN or a new Timestamp is available), replace this with new values from a given pandas DataFrame and append this to the Pytable. Basically, just update a Pytable. I can get the combined DataFrame using the combine_first method in Pandas. Below the Pytable is

How do I read/write to a subgroup withing a HDF5Store?

阅读更多关于 How do I read/write to a subgroup withing a HDF5Store?

问题 I am using the HDF5Store, to store some of my processed results, prior to analysis. Into the store I want to put 3 types of results, Raw results, that have not been processed at all, just read-in and merged from their original CSV formats Processed results that are derived from the raw results, that have some proccessing and division into more logical groupings Summarised results that have useful summery columns added and redundant columns removed, for easy reading. I thought a HDF5Store with