pytables

hdf5 error when format=table, pandas pytables

▼魔方 西西 提交于 2019-12-08 09:05:09
问题 It seems that I get an error when format=table but no error with format=fixed . Here is the command. What's weird is that it still seems to load the data. I just have to figure out a way to move past this. And it would give me peace of mind to not have any error. The dataframe is preprocessed, types set within the columns. The command I run is: hdf = pd.HDFStore('path-to-file') hdf.put('df',df,format='table') The error I get is: HDF5ExtError: HDF5 error back trace File "../../../src/H5Dio.c",

storing 'object'

雨燕双飞 提交于 2019-12-08 07:52:58
问题 Does PyTables support storing Python objects? something like this : dtype = np.dtype([('Name', '|S2'), ('objValue', object)]) data = np.zeros(3, dtype) file.createArray(box3,'complicated',data) I get error when trying to do this of course... How to properly store arrays of objects?Is it possible? Thanks 回答1: Try the pickle module if you want to store complicated data somewhere it isn't supported by the library in question. 回答2: You can save generic Python object with Pytables: >>> dtype = np

How to read a large image in chunks in python?

梦想与她 提交于 2019-12-08 04:33:40
问题 I'm trying to compute the difference in pixel values of two images, but I'm running into memory problems because the images I have are quite large. Is there way in python that I can read an image lets say in 10x10 chunks at a time rather than try to read in the whole image? I was hoping to solve the memory problem by reading an image in small chunks, assigning those chunks to numpy arrays and then saving those numpy arrays using pytables for further processing. Any advice would be greatly

merging several hdf5 files into one pytable

倾然丶 夕夏残阳落幕 提交于 2019-12-07 17:29:29
问题 I have several hdf5 files, each of them with the same structure. I'd like to create one pytable out of them by somehow merging the hdf5 files. What I mean is that if an array in file1 has size x and array in file2 has size y, the resulting array in the pytable will be of size x+y, containing first all the entries from file1 and then all the entries from file2. 回答1: How you want to do this depends slightly on the data type that you have. Arrays and CArrays have a static size so you need to

How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?

北战南征 提交于 2019-12-07 09:11:54
问题 I have the following pandas dataframe: import pandas as pd df = pd.read_csv(filename.csv) Now, I can use HDFStore to write the df object to file (like adding key-value pairs to a Python dictionary): store = HDFStore('store.h5') store['df'] = df http://pandas.pydata.org/pandas-docs/stable/io.html When I look at the contents, this object is a frame . store outputs <class 'pandas.io.pytables.HDFStore'> File path: store.h5 /df frame (shape->[552,23252]) However, in order to use indexing, one

storing 'object'

走远了吗. 提交于 2019-12-06 16:21:41
Does PyTables support storing Python objects? something like this : dtype = np.dtype([('Name', '|S2'), ('objValue', object)]) data = np.zeros(3, dtype) file.createArray(box3,'complicated',data) I get error when trying to do this of course... How to properly store arrays of objects?Is it possible? Thanks Try the pickle module if you want to store complicated data somewhere it isn't supported by the library in question. You can save generic Python object with Pytables: >>> dtype = np.dtype([('Name', '|S2'), ('objValue', object)]) >>> data = np.zeros(3, dtype) >>> file = tables.openFile('/tmp

How to read a large image in chunks in python?

给你一囗甜甜゛ 提交于 2019-12-06 15:21:02
I'm trying to compute the difference in pixel values of two images, but I'm running into memory problems because the images I have are quite large. Is there way in python that I can read an image lets say in 10x10 chunks at a time rather than try to read in the whole image? I was hoping to solve the memory problem by reading an image in small chunks, assigning those chunks to numpy arrays and then saving those numpy arrays using pytables for further processing. Any advice would be greatly appreciated. Regards, Berk You can use numpy.memmap and let the operating system decide which parts of the

Pandas: in memory sorting hdf5 files

China☆狼群 提交于 2019-12-06 05:47:48
I have the following problem: I have a set several hdf5 files with similar data frames which I want to sort globally based on multiple columns. My input is the file names and an ordered list of columns I want to use for sorting. The output should be a single hdf5 file containing all the sorted data. Each file can contain millions of rows. I can afford loading a single file in memory but not the entire dataset. Naively I would like first to copy all the data in a single hdf5 file (which is not difficult) and then find out a way to do in memory sorting of this huge file. Is there a quick way to

Database or Table Solution for Temporary Numpy Arrays

不想你离开。 提交于 2019-12-06 02:39:01
I am creating a Python desktop application that allows users to select different distributional forms to model agricultural yield data. I have the time series agricultural data - close to a million rows - saved in a SQLite database (although this is not set in stone if someone knows of a better choice). Once the user selects some data, say corn yields from 1990-2010 in Illinois, I want them to select a distributional form from a drop-down. Next, my function fits the distribution to the data and outputs 10,000 points drawn from that fitted distributional form in a Numpy array. I would like this

Why does pandas convert unsigned int greater than 2**63-1 to objects?

馋奶兔 提交于 2019-12-06 01:51:43
问题 When I convert a numpy array to a pandas data frame pandas changes uint64 types to object types if the integer is greater than 2^63 - 1. import pandas as pd import numpy as np x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)])) y = np.array([('foo', 2 ** 63 - 1)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)])) print pd.DataFrame(x).dtypes.unsigned dtype('O') print pd.DataFrame(y).dtypes.unsigned dtype('uint64') This is