I am looking for a sample code which can convert .h5 files to csv or tsv. I have to read .h5 and output should be csv or tsv.
Sample code would be much appreciated,p
import numpy as np
import h5py
with h5py.File('chunk0003.hdf5','r') as hf:
print('List of arrays in this file: \n', hf.keys())
### This lists arrays in the file [u'_self_key', u'chrms1', u'chrms2', u'cuts1', u'cuts2', u'misc', u'strands1', u'strands2']
r1 = h5py.File('chunk0003.hdf5','r')
a = r1['chrms1'][:]
b = r1['chrms2'][:]
c = r1['cuts1'][:]
d = r1['cuts2'][:]
e = r1['strands1'][:]
f = r1['strands2'][:]
You can also use h5dump -o dset.asci -y -w 400 dset.h5
-o dset.asci
specifies the output file -y -w 400
specifies the dimension size multiplied by the number of positions and spaces needed to print each value. You should take a very large number here.dset.h5
is of course the hdf5 file you want to convertThis converts it to an ascii file, which is easy imported to excel, from where you can easily save it as a .csv
(save as within excel, and specify file format). I did it a couple of times, and it worked for me. source
Example of HDF5 to CSV conversion can be found at https://github.com/amgreenstreet/Million-Song-Dataset-HDF5-to-CSV
It uses Python and converts Million Songs Dataset from HDF5 to CSV format.
I strongly recommend to use Python(x,y) version http://python-xy.github.io/ because this example uses additional Python packages like NumPy and PyTables. Python(x,y) has these packages included.
Another python solution using pandas
#!/usr/bin/env python3
import pandas as pd
import sys
fpath = sys.argv[1]
if len(sys.argv)>2:
key = sys.argv[2]
df = pd.read_hdf(fpath, key=key)
df = pd.read_hdf(fpath)
df.to_csv(sys.stdout, index=False)
This script is available here
First argument to this scrpt is hdf5 file. If second argument is passed, it is considered to be the name of column otherwise all columns are printed. It dumps the csv to stdout which you can redirect to a file.
For example, if your data is stored in hdf5 file called data.h5
and you have saved this script as hdf2df.py
$ python3 hdf2df.py data.hf > data.csv
will write the data to a csv file data.csv
import numpy as np
import h5py
np.savetxt(sys.stdout, h5py.File('foo.h5')['dataname'], '%g', ',')
Some notes:
instead of ','