Converting hdf5 to csv or tsv files

前端 未结 5 1524
情话喂你
情话喂你 2021-01-02 07:20

I am looking for a sample code which can convert .h5 files to csv or tsv. I have to read .h5 and output should be csv or tsv.

Sample code would be much appreciated,p

相关标签:
5条回答
  • 2021-01-02 07:31
    import numpy as np
    import h5py
    
    with h5py.File('chunk0003.hdf5','r') as hf:
        print('List of arrays in this file: \n', hf.keys())
    ### This lists arrays in the file [u'_self_key', u'chrms1', u'chrms2', u'cuts1', u'cuts2', u'misc', u'strands1', u'strands2']
    
    r1 = h5py.File('chunk0003.hdf5','r')
    a = r1['chrms1'][:]
    b = r1['chrms2'][:]
    c = r1['cuts1'][:]
    d = r1['cuts2'][:]
    e = r1['strands1'][:]
    f = r1['strands2'][:]
    r1.close()
    table=np.array([a,b,c,d,e,f])
    table2=table.transpose()
    np.savetxt('chunk0003.txt',table2,delimiter='\t')
    
    0 讨论(0)
  • 2021-01-02 07:37

    You can also use h5dump -o dset.asci -y -w 400 dset.h5

    • -o dset.asci specifies the output file
    • -y -w 400 specifies the dimension size multiplied by the number of positions and spaces needed to print each value. You should take a very large number here.
    • dset.h5 is of course the hdf5 file you want to convert

    This converts it to an ascii file, which is easy imported to excel, from where you can easily save it as a .csv (save as within excel, and specify file format). I did it a couple of times, and it worked for me. source

    0 讨论(0)
  • 2021-01-02 07:37

    Example of HDF5 to CSV conversion can be found at https://github.com/amgreenstreet/Million-Song-Dataset-HDF5-to-CSV

    It uses Python and converts Million Songs Dataset from HDF5 to CSV format.

    I strongly recommend to use Python(x,y) version http://python-xy.github.io/ because this example uses additional Python packages like NumPy and PyTables. Python(x,y) has these packages included.

    0 讨论(0)
  • 2021-01-02 07:39

    Another python solution using pandas.

    #!/usr/bin/env python3
    
    import pandas as pd
    import sys
    fpath = sys.argv[1]
    if len(sys.argv)>2:
        key = sys.argv[2]
        df = pd.read_hdf(fpath, key=key)
    else:
        df = pd.read_hdf(fpath)
    
    df.to_csv(sys.stdout, index=False)
    

    This script is available here

    First argument to this scrpt is hdf5 file. If second argument is passed, it is considered to be the name of column otherwise all columns are printed. It dumps the csv to stdout which you can redirect to a file.

    For example, if your data is stored in hdf5 file called data.h5 and you have saved this script as hdf2df.py then

    $ python3 hdf2df.py data.hf > data.csv
    

    will write the data to a csv file data.csv.

    0 讨论(0)
  • 2021-01-02 07:50

    Python:

    import numpy as np
    import h5py
    np.savetxt(sys.stdout, h5py.File('foo.h5')['dataname'], '%g', ',')
    

    Some notes:

    1. sys.stdout can be any file, or a file name string like "out.csv".
    2. %g is used to make the formatting human-friendly.
    3. If you want TSV just use '\t' instead of ','.
    4. I've assumed you have a single dataset name within the file (dataname).
    0 讨论(0)
提交回复
热议问题