pytables | 易学教程

pd.read_hdf throws 'cannot set WRITABLE flag to True of this array'

阅读更多关于 pd.read_hdf throws 'cannot set WRITABLE flag to True of this array'

When running pd.read_hdf('myfile.h5') I get the following traceback error: [[...some longer traceback]] ~/.local/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop) 2487 2488 if isinstance(node, tables.VLArray): -> 2489 ret = node[0][start:stop] 2490 else: 2491 dtype = getattr(attrs, 'value_type', None) ~/.local/lib/python3.6/site-packages/tables/vlarray.py in getitem (self, key) ~/.local/lib/python3.6/site-packages/tables/vlarray.py in read(self, start, stop, step) tables/hdf5extension.pyx in tables.hdf5extension.VLArray._read_array() ValueError: cannot set

ptrepack sortby needs 'full' index

阅读更多关于 ptrepack sortby needs 'full' index

问题 I am trying to ptrepack a HDF file that was created with pandas HDFStore pytables interface. The main index of the dataframe was time but I made some more columns data_columns so that I can filter for data on-disk via these data_columns. Now I would like to sort the HDF file by one of those columns (because the selection is too slow for my taste, 84 GB file), using ptrepack with the sortby option like so: ()[maye@luna4 .../nominal]$ ptrepack --chunkshape=auto --propindexes --complevel=9 -

Matrix multiplication using hdf5

阅读更多关于 Matrix multiplication using hdf5

I'm trying to multiplicate 2 big matrices with memory limit using hdf5 (pytables) but function numpy.dot seems to give me error: Valueerror: array is too big I need to do matrix multiplication by myself maybe blockwise or there is some another python function similar to numpy.dot? import numpy as np import time import tables import cProfile import numexpr as ne n_row=10000 n_col=100 n_batch=10 rows = n_row cols = n_col batches = n_batch atom = tables.UInt8Atom() #? filters = tables.Filters(complevel=9, complib='blosc') # tune parameters fileName_a = 'C:\carray_a.h5' shape_a = (rows*batches,

Building a huge numpy array using pytables

阅读更多关于 Building a huge numpy array using pytables

How can I create a huge numpy array using pytables. I tried this but gives me the "ValueError: array is too big." error: import numpy as np import tables as tb ndim = 60000 h5file = tb.openFile('test.h5', mode='w', title="Test Array") root = h5file.root h5file.createArray(root, "test", np.zeros((ndim,ndim), dtype=float)) h5file.close() b1r3k You could try to use tables.CArray class as it supports compression but... I think questions is more about numpy than pytables because you are creating array using numpy before storing it with pytables. In that way you need a lot of ram to execute np.zeros

Improve pandas (PyTables?) HDF5 table write performance

阅读更多关于 Improve pandas (PyTables?) HDF5 table write performance

I've been using pandas for research now for about two months to great effect. With large numbers of medium-sized trace event datasets, pandas + PyTables (the HDF5 interface) does a tremendous job of allowing me to process heterogenous data using all the Python tools I know and love. Generally speaking, I use the Fixed (formerly "Storer") format in PyTables, as my workflow is write-once, read-many, and many of my datasets are sized such that I can load 50-100 of them into memory at a time with no serious disadvantages. (NB: I do much of my work on Opteron server-class machines with 128GB+

HDFStore: table.select and RAM usage

阅读更多关于 HDFStore: table.select and RAM usage

问题 I am trying to select random rows from a HDFStore table of about 1 GB. RAM usage explodes when I ask for about 50 random rows. I am using pandas 0-11-dev, python 2.7, linux64 . In this first case the RAM usage fits the size of chunk with pd.get_store("train.h5",'r') as train: for chunk in train.select('train',chunksize=50): pass In this second case, it seems like the whole table is loaded into RAM r=random.choice(400000,size=40,replace=False) train.select('train',pd.Term("index",r)) In this

Matrix multiplication using hdf5

阅读更多关于 Matrix multiplication using hdf5

问题 I'm trying to multiplicate 2 big matrices with memory limit using hdf5 (pytables) but function numpy.dot seems to give me error: Valueerror: array is too big I need to do matrix multiplication by myself maybe blockwise or there is some another python function similar to numpy.dot? import numpy as np import time import tables import cProfile import numexpr as ne n_row=10000 n_col=100 n_batch=10 rows = n_row cols = n_col batches = n_batch atom = tables.UInt8Atom() #? filters = tables.Filters

Building a huge numpy array using pytables

阅读更多关于 Building a huge numpy array using pytables

问题 How can I create a huge numpy array using pytables. I tried this but gives me the "ValueError: array is too big." error: import numpy as np import tables as tb ndim = 60000 h5file = tb.openFile('test.h5', mode='w', title="Test Array") root = h5file.root h5file.createArray(root, "test", np.zeros((ndim,ndim), dtype=float)) h5file.close() 回答1: You could try to use tables.CArray class as it supports compression but... I think questions is more about numpy than pytables because you are creating

Iteratively writing to HDF5 Stores in Pandas

阅读更多关于 Iteratively writing to HDF5 Stores in Pandas

问题 Pandas has the following examples for how to store Series , DataFrames and Panels in HDF5 files: Prepare some data: In [1142]: store = HDFStore('store.h5') In [1143]: index = date_range('1/1/2000', periods=8) In [1144]: s = Series(randn(5), index=['a', 'b', 'c', 'd', 'e']) In [1145]: df = DataFrame(randn(8, 3), index=index, ......: columns=['A', 'B', 'C']) ......: In [1146]: wp = Panel(randn(2, 5, 4), items=['Item1', 'Item2'], ......: major_axis=date_range('1/1/2000', periods=5), ......:

Issue for insert using psycopg

阅读更多关于 Issue for insert using psycopg

I am reading data from a .mat file using the Pytables module. After reading the data, I want to insert this data into the database using psycopg. Here is a sample code piece: file = tables.openFile(matFile) x = 0 #populate the matData list for var in dest: data = file.getNode('/' + var)[:] matData.append(data) x = x+1 #insert into db for i in range(0,x): cur.execute("""INSERT INTO \"%s\" (%s) VALUES (%s)""" % tableName,dest[i],matData[i]) ) I am getting the following error: Traceback (most recent call last): File "./loadDBFromMAT.py", line 111, in <module> readInputFileAndLoad(args.matFileName