pytables

pd.read_hdf throws 'cannot set WRITABLE flag to True of this array'

牧云@^-^@ 提交于 2019-11-30 11:28:01
When running pd.read_hdf('myfile.h5') I get the following traceback error: [[...some longer traceback]] ~/.local/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop) 2487 2488 if isinstance(node, tables.VLArray): -> 2489 ret = node[0][start:stop] 2490 else: 2491 dtype = getattr(attrs, 'value_type', None) ~/.local/lib/python3.6/site-packages/tables/vlarray.py in getitem (self, key) ~/.local/lib/python3.6/site-packages/tables/vlarray.py in read(self, start, stop, step) tables/hdf5extension.pyx in tables.hdf5extension.VLArray._read_array() ValueError: cannot set

ptrepack sortby needs 'full' index

↘锁芯ラ 提交于 2019-11-30 07:41:13
问题 I am trying to ptrepack a HDF file that was created with pandas HDFStore pytables interface. The main index of the dataframe was time but I made some more columns data_columns so that I can filter for data on-disk via these data_columns. Now I would like to sort the HDF file by one of those columns (because the selection is too slow for my taste, 84 GB file), using ptrepack with the sortby option like so: ()[maye@luna4 .../nominal]$ ptrepack --chunkshape=auto --propindexes --complevel=9 -

Matrix multiplication using hdf5

那年仲夏 提交于 2019-11-30 04:33:28
I'm trying to multiplicate 2 big matrices with memory limit using hdf5 (pytables) but function numpy.dot seems to give me error: Valueerror: array is too big I need to do matrix multiplication by myself maybe blockwise or there is some another python function similar to numpy.dot? import numpy as np import time import tables import cProfile import numexpr as ne n_row=10000 n_col=100 n_batch=10 rows = n_row cols = n_col batches = n_batch atom = tables.UInt8Atom() #? filters = tables.Filters(complevel=9, complib='blosc') # tune parameters fileName_a = 'C:\carray_a.h5' shape_a = (rows*batches,

Building a huge numpy array using pytables

旧城冷巷雨未停 提交于 2019-11-30 04:24:27
How can I create a huge numpy array using pytables. I tried this but gives me the "ValueError: array is too big." error: import numpy as np import tables as tb ndim = 60000 h5file = tb.openFile('test.h5', mode='w', title="Test Array") root = h5file.root h5file.createArray(root, "test", np.zeros((ndim,ndim), dtype=float)) h5file.close() b1r3k You could try to use tables.CArray class as it supports compression but... I think questions is more about numpy than pytables because you are creating array using numpy before storing it with pytables. In that way you need a lot of ram to execute np.zeros

Improve pandas (PyTables?) HDF5 table write performance

倾然丶 夕夏残阳落幕 提交于 2019-11-29 19:44:50
I've been using pandas for research now for about two months to great effect. With large numbers of medium-sized trace event datasets, pandas + PyTables (the HDF5 interface) does a tremendous job of allowing me to process heterogenous data using all the Python tools I know and love. Generally speaking, I use the Fixed (formerly "Storer") format in PyTables, as my workflow is write-once, read-many, and many of my datasets are sized such that I can load 50-100 of them into memory at a time with no serious disadvantages. (NB: I do much of my work on Opteron server-class machines with 128GB+

HDFStore: table.select and RAM usage

依然范特西╮ 提交于 2019-11-29 14:48:16
问题 I am trying to select random rows from a HDFStore table of about 1 GB. RAM usage explodes when I ask for about 50 random rows. I am using pandas 0-11-dev, python 2.7, linux64 . In this first case the RAM usage fits the size of chunk with pd.get_store("train.h5",'r') as train: for chunk in train.select('train',chunksize=50): pass In this second case, it seems like the whole table is loaded into RAM r=random.choice(400000,size=40,replace=False) train.select('train',pd.Term("index",r)) In this

Matrix multiplication using hdf5

China☆狼群 提交于 2019-11-29 01:55:42
问题 I'm trying to multiplicate 2 big matrices with memory limit using hdf5 (pytables) but function numpy.dot seems to give me error: Valueerror: array is too big I need to do matrix multiplication by myself maybe blockwise or there is some another python function similar to numpy.dot? import numpy as np import time import tables import cProfile import numexpr as ne n_row=10000 n_col=100 n_batch=10 rows = n_row cols = n_col batches = n_batch atom = tables.UInt8Atom() #? filters = tables.Filters

Building a huge numpy array using pytables

流过昼夜 提交于 2019-11-29 01:10:34
问题 How can I create a huge numpy array using pytables. I tried this but gives me the "ValueError: array is too big." error: import numpy as np import tables as tb ndim = 60000 h5file = tb.openFile('test.h5', mode='w', title="Test Array") root = h5file.root h5file.createArray(root, "test", np.zeros((ndim,ndim), dtype=float)) h5file.close() 回答1: You could try to use tables.CArray class as it supports compression but... I think questions is more about numpy than pytables because you are creating

Iteratively writing to HDF5 Stores in Pandas

孤街醉人 提交于 2019-11-28 15:51:45
问题 Pandas has the following examples for how to store Series , DataFrames and Panels in HDF5 files: Prepare some data: In [1142]: store = HDFStore('store.h5') In [1143]: index = date_range('1/1/2000', periods=8) In [1144]: s = Series(randn(5), index=['a', 'b', 'c', 'd', 'e']) In [1145]: df = DataFrame(randn(8, 3), index=index, ......: columns=['A', 'B', 'C']) ......: In [1146]: wp = Panel(randn(2, 5, 4), items=['Item1', 'Item2'], ......: major_axis=date_range('1/1/2000', periods=5), ......:

Issue for insert using psycopg

两盒软妹~` 提交于 2019-11-28 14:44:10
I am reading data from a .mat file using the Pytables module. After reading the data, I want to insert this data into the database using psycopg. Here is a sample code piece: file = tables.openFile(matFile) x = 0 #populate the matData list for var in dest: data = file.getNode('/' + var)[:] matData.append(data) x = x+1 #insert into db for i in range(0,x): cur.execute("""INSERT INTO \"%s\" (%s) VALUES (%s)""" % tableName,dest[i],matData[i]) ) I am getting the following error: Traceback (most recent call last): File "./loadDBFromMAT.py", line 111, in <module> readInputFileAndLoad(args.matFileName