How do I read CSV data into a record array in NumPy?

后端 未结 11 1302
广开言路
广开言路 2020-11-22 02:55

I wonder if there is a direct way to import the contents of a CSV file into a record array, much in the way that R\'s read.table(), read.delim(), a

相关标签:
11条回答
  • 2020-11-22 03:58

    This is the easiest way:

    import csv
    with open('testfile.csv', newline='') as csvfile:
        data = list(csv.reader(csvfile))
    

    Now each entry in data is a record, represented as an array. So you have a 2D array. It saved me so much time.

    0 讨论(0)
  • 2020-11-22 03:58

    I would suggest using tables (pip3 install tables). You can save your .csv file to .h5 using pandas (pip3 install pandas),

    import pandas as pd
    data = pd.read_csv("dataset.csv")
    store = pd.HDFStore('dataset.h5')
    store['mydata'] = data
    store.close()
    

    You can then easily, and with less time even for huge amount of data, load your data in a NumPy array.

    import pandas as pd
    store = pd.HDFStore('dataset.h5')
    data = store['mydata']
    store.close()
    
    # Data in NumPy format
    data = data.values
    
    0 讨论(0)
  • 2020-11-22 03:58

    This work as a charm...

    import csv
    with open("data.csv", 'r') as f:
        data = list(csv.reader(f, delimiter=";"))
    
    import numpy as np
    data = np.array(data, dtype=np.float)
    
    0 讨论(0)
  • 2020-11-22 04:01

    I timed the

    from numpy import genfromtxt
    genfromtxt(fname = dest_file, dtype = (<whatever options>))
    

    versus

    import csv
    import numpy as np
    with open(dest_file,'r') as dest_f:
        data_iter = csv.reader(dest_f,
                               delimiter = delimiter,
                               quotechar = '"')
        data = [data for data in data_iter]
    data_array = np.asarray(data, dtype = <whatever options>)
    

    on 4.6 million rows with about 70 columns and found that the NumPy path took 2 min 16 secs and the csv-list comprehension method took 13 seconds.

    I would recommend the csv-list comprehension method as it is most likely relies on pre-compiled libraries and not the interpreter as much as NumPy. I suspect the pandas method would have similar interpreter overhead.

    0 讨论(0)
  • 2020-11-22 04:01

    You can use this code to send CSV file data into an array:

    import numpy as np
    csv = np.genfromtxt('test.csv', delimiter=",")
    print(csv)
    
    0 讨论(0)
提交回复
热议问题