I have a csv
file with headers like:
Given this test.csv
file:
\"A\",\"B\",\"C\",\"D\",\"E\",\"F\",\"timestamp\"
611.88243,
Using numpy alone, the options you show are your only options. Either use an ndarray of homogeneous dtype with shape (3,7), or a structured array of (potentially) heterogenous dtype and shape (3,).
If you really want a data structure with labeled columns and shape (3,7), (and lots of other goodies) you could use a pandas DataFrame:
In [67]: import pandas as pd
In [68]: df = pd.read_csv('data'); df
Out[68]:
A B C D E F timestamp
0 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291112e+12
1 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291113e+12
2 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291121e+12
In [70]: df['A']
Out[70]:
0 611.88243
1 611.88243
2 611.88243
Name: A, dtype: float64
In [71]: df.shape
Out[71]: (3, 7)
A pure NumPy/Python alternative would be to use a dict to map the column names to indices:
import numpy as np
import csv
with open(filename) as f:
reader = csv.reader(f)
columns = next(reader)
colmap = dict(zip(columns, range(len(columns))))
arr = np.matrix(np.loadtxt(filename, delimiter=",", skiprows=1))
print(arr[:, colmap['A']])
yields
[[ 611.88243]
[ 611.88243]
[ 611.88243]]
This way, arr
is a NumPy matrix, with columns that can be accessed by label using the syntax
arr[:, colmap[column_name]]
Because your data is homogeneous--all the elements are floating point values--you can create a view of the data returned by genfromtxt
that is a 2D array. For example,
In [42]: r = np.genfromtxt("test.csv", delimiter=',', names=True)
Create a numpy array that is a "view" of r
. This is a regular numpy array, but it is created using the data in r
:
In [43]: a = r.view(np.float64).reshape(len(r), -1)
In [44]: a.shape
Out[44]: (3, 7)
In [45]: a[:, 0]
Out[45]: array([ 611.88243, 611.88243, 611.88243])
In [46]: r['A']
Out[46]: array([ 611.88243, 611.88243, 611.88243])
r
and a
refer to the same block of memory:
In [47]: a[0, 0] = -1
In [48]: r['A']
Out[48]: array([ -1. , 611.88243, 611.88243])