load csv file to numpy and access columns by name

前端 未结 2 783
日久生厌
日久生厌 2021-01-12 01:27

I have a csv file with headers like:

Given this test.csv file:

\"A\",\"B\",\"C\",\"D\",\"E\",\"F\",\"timestamp\"
611.88243,         


        
2条回答
  •  太阳男子
    2021-01-12 01:59

    Using numpy alone, the options you show are your only options. Either use an ndarray of homogeneous dtype with shape (3,7), or a structured array of (potentially) heterogenous dtype and shape (3,).

    If you really want a data structure with labeled columns and shape (3,7), (and lots of other goodies) you could use a pandas DataFrame:

    In [67]: import pandas as pd
    In [68]: df = pd.read_csv('data'); df
    Out[68]: 
               A          B     C          D           E          F     timestamp
    0  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291112e+12
    1  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291113e+12
    2  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291121e+12    
    
    In [70]: df['A']
    Out[70]: 
    0    611.88243
    1    611.88243
    2    611.88243
    Name: A, dtype: float64
    
    In [71]: df.shape
    Out[71]: (3, 7)
    

    A pure NumPy/Python alternative would be to use a dict to map the column names to indices:

    import numpy as np
    import csv
    with open(filename) as f:
        reader = csv.reader(f)
        columns = next(reader)
        colmap = dict(zip(columns, range(len(columns))))
    
    arr = np.matrix(np.loadtxt(filename, delimiter=",", skiprows=1))
    print(arr[:, colmap['A']])
    

    yields

    [[ 611.88243]
     [ 611.88243]
     [ 611.88243]]
    

    This way, arr is a NumPy matrix, with columns that can be accessed by label using the syntax

    arr[:, colmap[column_name]]
    

提交回复
热议问题