I wonder if there is a direct way to import the contents of a CSV file into a record array, much in the way that R\'s read.table()
, read.delim()
, a
As I tried both ways using NumPy and Pandas, using pandas has a lot of advantages:
This is my test code:
$ for f in test_pandas.py test_numpy_csv.py ; do /usr/bin/time python $f; done
2.94user 0.41system 0:03.05elapsed 109%CPU (0avgtext+0avgdata 502068maxresident)k
0inputs+24outputs (0major+107147minor)pagefaults 0swaps
23.29user 0.72system 0:23.72elapsed 101%CPU (0avgtext+0avgdata 1680888maxresident)k
0inputs+0outputs (0major+416145minor)pagefaults 0swaps
from numpy import genfromtxt
train = genfromtxt('/home/hvn/me/notebook/train.csv', delimiter=',')
from pandas import read_csv
df = read_csv('/home/hvn/me/notebook/train.csv')
du -h ~/me/notebook/train.csv
59M /home/hvn/me/notebook/train.csv
With NumPy and pandas at versions:
$ pip freeze | egrep -i 'pandas|numpy'
numpy==1.13.3
pandas==0.20.2