pytables | 易学教程

Pip does not acknowledge Cython

阅读更多关于 Pip does not acknowledge Cython

问题 I just installed pip and Python via home-brew on a fresh Mac OS installation. First of all, my pip is not installing dependencies at all - which forces me to re-run 'pip install tables' 3 times and every time it will tell me a dependency and I will install that and then rerun it again. Is this expected behavior? Second, it does not accept the installation of Cython that it installed itself moments ago: $ pip show cython --- Name: Cython Version: 0.21 Location: /usr/local/lib/python2.7/site

Release hdf5 disk memory after table or node removal with pytables or pandas

阅读更多关于 Release hdf5 disk memory after table or node removal with pytables or pandas

问题 I'm using HDFStore with pandas / pytables. After removing a table or object, hdf5 file size remains unaffected. It seems this space is reused afterwards when additional objects are added to store, but it can be an issue if large space is wasted. I have not found any command in pandas nor pytables APIs that might be used to recover hdf5 memory. Do you know of any mechanism to improve data management in hdf5 files? 回答1: see here you need to ptrepack it, which rewrites the file. ptrepack -

What is the advantage of PyTables? [closed]

阅读更多关于 What is the advantage of PyTables? [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . I have recently started learning about PyTables and found it very interesting. My question is: What are the basic advantages of

How should I use the h5py library for storing time series data?

阅读更多关于 How should I use the h5py library for storing time series data?

问题 I have some time series data that i previously stored as hdf5 files using pytables . I recently tried storing the same with h5py lib. However, since all elements of numpy array have to be of same dtype, I have to convert the date (which is usually the index) into ' float64 ' type before storing it using h5py lib. When I use pytables , the index and its dtype are preserved which makes it possible for me to query the time-series without the need of pulling it all in the memory. I guess with

How to get faster code than numpy.dot for matrix multiplication?

阅读更多关于 How to get faster code than numpy.dot for matrix multiplication?

问题 Here Matrix multiplication using hdf5 I use hdf5 (pytables) for big matrix multiplication, but I was suprised because using hdf5 it works even faster then using plain numpy.dot and store matrices in RAM, what is the reason of this behavior? And maybe there is some faster function for matrix multiplication in python, because I still use numpy.dot for small block matrix multiplication. here is some code: Assume matrices can fit in RAM: test on matrix 10*1000 x 1000. Using default numpy(I think

Python: how to store a numpy multidimensional array in PyTables?

阅读更多关于 Python: how to store a numpy multidimensional array in PyTables?

问题 How can I put a numpy multidimensional array in a HDF5 file using PyTables? From what I can tell I can't put an array field in a pytables table. I also need to store some info about this array and be able to do mathematical computations on it. Any suggestions? 回答1: There may be a simpler way, but this is how you'd go about doing it, as far as I know: import numpy as np import tables # Generate some data x = np.random.random((100,100,100)) # Store "x" in a chunked array... f = tables.open_file

Pandas HDF5 Select with Where on non natural-named columns

阅读更多关于 Pandas HDF5 Select with Where on non natural-named columns

问题 in my continuing spree of exotic pandas/HDF5 issues, I encountered the following: I have a series of non-natural named columns (nb: because of a good reason, with negative numbers being "system" ids etc), which normally doesn't give an issue: fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13']) however, my select statement does fall over it: >>> fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'], where=[('a-6', '=', [0, 25, 28])]) blablabla File "/srv/www/li/venv/local/lib

Store pandas DataFrame in PyTables table without storing index

阅读更多关于 Store pandas DataFrame in PyTables table without storing index

问题 In many DataFrame.to_foo functions I can specify that I don't want to write the index >>> help(df.to_csv) Write DataFrame to a comma-separated values (csv) file Parameters ---------- ... index : boolean, default True Write row names (index) ... Does similar functionality exist for DataFrame.to_hdf ? I would like to not store the index in the PyTables table. 回答1: You could call out to h5py and interact with HDF5 directly. data = df.values with h5py.File('data.h5','w') as f: f.create_dataset(

How can the shape of a pytables table column be defined by a variable?

阅读更多关于 How can the shape of a pytables table column be defined by a variable?

问题 I'm trying to create an IsDescription subclass, so that I can define the structure of a table I'm trying to create. One of the attributes of the subclass** needs to be shaped given a certain length that is unknown until runtime (it depends on a file being parsed), but is fixed at runtime. Sample code: import tables class MyClass(tables.IsDescription): def __init__(self, param): var1 = tables.Float64Col(shape=(param)) MyClass1 = MyClass(12) Which returns: TypeError: object.__new__() takes no