pytables

How can I combine multiple .h5 file?

倾然丶 夕夏残阳落幕 提交于 2020-01-06 08:24:29
问题 Everything that is available online is too complicated. My database is large to I exported it in parts. I now have three .h5 file and I would like to combine them into one .h5 file for further work. How can I do it? 回答1: There are at least 3 ways to combine data from individual HDF5 files into a single file: Use external links to create a new file that points to the data in your other files (requires pytables/tables module) Copy the data with the HDF Group utility: h5copy.exe Copy the data

How to de-reference a list of external links using pytables?

六眼飞鱼酱① 提交于 2020-01-06 05:45:04
问题 I have created external links leading from one hdf5 file to another using pytables. My question is how to de-reference it in a loop? for example: Let's assume file_name = "collection.h5" , where external links are stored I created external links under the root node and when i traverse the nodes under the root, i get the following output : /link1 (ExternalLink) -> /files/data1.h5:/weights/Image /link2 (ExternalLink) -> /files/data2.h5:/weights/Image and so on, I know that for de-referencing a

What is the most memory efficient way to combine read_sorted and Expr in pytables?

白昼怎懂夜的黑 提交于 2020-01-05 11:03:09
问题 I am looking for the most memory efficient way to combine reading a Pytables table (columns: x,y,z) in a sorted order(z column has a CSI) and evaluating an expression like x+a*y+b*z where a and b are constant. Up until now my only solution was to copy the entire table with the "sortyby=z" flag and then evaluating the expression piece-wise on the table. Note: I want to keep the result x+a*y+b*z in memory to do some reduction operations on it which are not available directly in Pytables and

Pytables slow on query for non-matching string

只愿长相守 提交于 2020-01-05 09:14:35
问题 I'm relatively new in python and I'm using pytables to store some genomic annotations in hdf for faster query. I find querying a non-matching string in the table is slow, but I'm unsure how to optimize it for better performance. Below shown is one of the tables: In [5]: t Out[5]: /gene/annotation (Table(315202,), fletcher32, blosc(5)) '' description := { "name": StringCol(itemsize=36, shape=(), dflt='', pos=0), "track": StringCol(itemsize=12, shape=(), dflt='', pos=1), "etype": StringCol

(In Pandas) Why is frequency information lost when storing in HDF5 as a Table?

给你一囗甜甜゛ 提交于 2020-01-05 07:59:07
问题 I am storing timeseries data in HDF5 format within pandas, Because I want to be able to access the data directly on disk I am using the PyTable format with table=True when writing. It appears that I then loose frequency information on my TimeSeries objects after writing them to HDF5. This can be seen by toggling is_table value in script below: import pandas as pd is_table = False times = pd.date_range('2000-1-1', periods=3, freq='H') series = pd.Series(xrange(3), index=times) print 'frequency

HDFStore start stop not working

两盒软妹~` 提交于 2020-01-05 07:34:46
问题 Is it clear what I am doing wrong? I'm experimenting with pandas HDFStore.select start and stop options and it's not making a difference. The commands I'm using are: import pandas as pd hdf = pd.HDFStore(path % 'results') len(hdf.select('results',start=15,stop=20)) hoping to get a length of 4 or 5 or however it's counted, but it gives me the whole darn dataframe. Here is a screenshot: 回答1: When writing to the h5 file, select pandas.to_hdf(<path>,<key>,format='tables') which enables subsets of

Is there a way to store PyTable columns in a specific order?

家住魔仙堡 提交于 2020-01-03 13:32:29
问题 It seems that the PyTable columns are alphabetically ordered when using both dictionary or class for schema definition for the call to createTable(). My need is to establish a specific order and then use numpy.genfromtxt() to read and store my data from text. My text file does not have the variable names included alphabetically as they are for the PyTable. For example, assuming text file is named mydata.txt and is organized as follows: time(row1) bVar(row1) dVar(row1) aVar(row1) cVar(row1)

pandas read_hdf with 'where' condition limitation?

大憨熊 提交于 2020-01-02 16:18:10
问题 I need to query an HDF5 file with where clause with 3 conditions, one of the condition is a list with a length of 30: myList = list(xrange(30)) h5DF = pd.read_hdf(h5Filename, 'df', where='index=myList & date=dateString & time=timeString') The query above gives me ValueError: too many inputs and the error is reproducible. If I reduce length of the list to 29 (three conditions): myList = list(xrange(29)) h5DF = pd.read_hdf(h5Filename, 'df', where='index=myList & date=dateString & time

In PyTables, how to create nested array of variable length?

谁说我不能喝 提交于 2019-12-31 13:28:38
问题 I'm using PyTables 2.2.1 w/ Python 2.6, and I would like to create a table which contains nested arrays of variable length. I have searched the PyTables documentation, and the tutorial example (PyTables Tutorial 3.8) shows how to create a nested array of length = 1. But for this example, how would I add a variable number of rows to data 'info2/info3/x' and 'info2/info3/y'? For perhaps an easier to understand table structure, here's my homegrown example: """Desired Pytable output: DIEM TEMPUS

Numpy efficient big matrix multiplication

99封情书 提交于 2019-12-31 10:51:59
问题 To store big matrix on disk I use numpy.memmap. Here is a sample code to test big matrix multiplication: import numpy as np import time rows= 10000 # it can be large for example 1kk cols= 1000 #create some data in memory data = np.arange(rows*cols, dtype='float32') data.resize((rows,cols)) #create file on disk fp0 = np.memmap('C:/data_0', dtype='float32', mode='w+', shape=(rows,cols)) fp1 = np.memmap('C:/data_1', dtype='float32', mode='w+', shape=(rows,cols)) fp0[:]=data[:] fp1[:]=data[:]