I wonder if there is a direct way to import the contents of a CSV file into a record array, much in the way that R\'s read.table()
, read.delim()
, a
This is the easiest way:
import csv
with open('testfile.csv', newline='') as csvfile:
data = list(csv.reader(csvfile))
Now each entry in data is a record, represented as an array. So you have a 2D array. It saved me so much time.
I would suggest using tables (pip3 install tables
). You can save your .csv
file to .h5
using pandas (pip3 install pandas
),
import pandas as pd
data = pd.read_csv("dataset.csv")
store = pd.HDFStore('dataset.h5')
store['mydata'] = data
store.close()
You can then easily, and with less time even for huge amount of data, load your data in a NumPy array.
import pandas as pd
store = pd.HDFStore('dataset.h5')
data = store['mydata']
store.close()
# Data in NumPy format
data = data.values
This work as a charm...
import csv
with open("data.csv", 'r') as f:
data = list(csv.reader(f, delimiter=";"))
import numpy as np
data = np.array(data, dtype=np.float)
I timed the
from numpy import genfromtxt
genfromtxt(fname = dest_file, dtype = (<whatever options>))
versus
import csv
import numpy as np
with open(dest_file,'r') as dest_f:
data_iter = csv.reader(dest_f,
delimiter = delimiter,
quotechar = '"')
data = [data for data in data_iter]
data_array = np.asarray(data, dtype = <whatever options>)
on 4.6 million rows with about 70 columns and found that the NumPy path took 2 min 16 secs and the csv-list comprehension method took 13 seconds.
I would recommend the csv-list comprehension method as it is most likely relies on pre-compiled libraries and not the interpreter as much as NumPy. I suspect the pandas method would have similar interpreter overhead.
You can use this code to send CSV file data into an array:
import numpy as np
csv = np.genfromtxt('test.csv', delimiter=",")
print(csv)