load csv file to numpy and access columns by name

前端 未结 2 780
日久生厌
日久生厌 2021-01-12 01:27

I have a csv file with headers like:

Given this test.csv file:

\"A\",\"B\",\"C\",\"D\",\"E\",\"F\",\"timestamp\"
611.88243,         


        
相关标签:
2条回答
  • 2021-01-12 01:59

    Using numpy alone, the options you show are your only options. Either use an ndarray of homogeneous dtype with shape (3,7), or a structured array of (potentially) heterogenous dtype and shape (3,).

    If you really want a data structure with labeled columns and shape (3,7), (and lots of other goodies) you could use a pandas DataFrame:

    In [67]: import pandas as pd
    In [68]: df = pd.read_csv('data'); df
    Out[68]: 
               A          B     C          D           E          F     timestamp
    0  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291112e+12
    1  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291113e+12
    2  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291121e+12    
    
    In [70]: df['A']
    Out[70]: 
    0    611.88243
    1    611.88243
    2    611.88243
    Name: A, dtype: float64
    
    In [71]: df.shape
    Out[71]: (3, 7)
    

    A pure NumPy/Python alternative would be to use a dict to map the column names to indices:

    import numpy as np
    import csv
    with open(filename) as f:
        reader = csv.reader(f)
        columns = next(reader)
        colmap = dict(zip(columns, range(len(columns))))
    
    arr = np.matrix(np.loadtxt(filename, delimiter=",", skiprows=1))
    print(arr[:, colmap['A']])
    

    yields

    [[ 611.88243]
     [ 611.88243]
     [ 611.88243]]
    

    This way, arr is a NumPy matrix, with columns that can be accessed by label using the syntax

    arr[:, colmap[column_name]]
    
    0 讨论(0)
  • 2021-01-12 02:02

    Because your data is homogeneous--all the elements are floating point values--you can create a view of the data returned by genfromtxt that is a 2D array. For example,

    In [42]: r = np.genfromtxt("test.csv", delimiter=',', names=True)
    

    Create a numpy array that is a "view" of r. This is a regular numpy array, but it is created using the data in r:

    In [43]: a = r.view(np.float64).reshape(len(r), -1)
    
    In [44]: a.shape
    Out[44]: (3, 7)
    
    In [45]: a[:, 0]
    Out[45]: array([ 611.88243,  611.88243,  611.88243])
    
    In [46]: r['A']
    Out[46]: array([ 611.88243,  611.88243,  611.88243])
    

    r and a refer to the same block of memory:

    In [47]: a[0, 0] = -1
    
    In [48]: r['A']
    Out[48]: array([  -1.     ,  611.88243,  611.88243])
    
    0 讨论(0)
提交回复
热议问题