How do I read CSV data into a record array in NumPy?

后端 未结 11 1301
广开言路
广开言路 2020-11-22 02:55

I wonder if there is a direct way to import the contents of a CSV file into a record array, much in the way that R\'s read.table(), read.delim(), a

相关标签:
11条回答
  • 2020-11-22 03:37

    Using numpy.loadtxt

    A quite simple method. But it requires all the elements being float (int and so on)

    import numpy as np 
    data = np.loadtxt('c:\\1.csv',delimiter=',',skiprows=0)  
    
    0 讨论(0)
  • 2020-11-22 03:39

    I tried this:

    import pandas as p
    import numpy as n
    
    closingValue = p.read_csv("<FILENAME>", usecols=[4], dtype=float)
    print(closingValue)
    
    0 讨论(0)
  • 2020-11-22 03:42

    You can use Numpy's genfromtxt() method to do so, by setting the delimiter kwarg to a comma.

    from numpy import genfromtxt
    my_data = genfromtxt('my_file.csv', delimiter=',')
    

    More information on the function can be found at its respective documentation.

    0 讨论(0)
  • 2020-11-22 03:47

    I would recommend the read_csv function from the pandas library:

    import pandas as pd
    df=pd.read_csv('myfile.csv', sep=',',header=None)
    df.values
    array([[ 1. ,  2. ,  3. ],
           [ 4. ,  5.5,  6. ]])
    

    This gives a pandas DataFrame - allowing many useful data manipulation functions which are not directly available with numpy record arrays.

    DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table...


    I would also recommend genfromtxt. However, since the question asks for a record array, as opposed to a normal array, the dtype=None parameter needs to be added to the genfromtxt call:

    Given an input file, myfile.csv:

    1.0, 2, 3
    4, 5.5, 6
    
    import numpy as np
    np.genfromtxt('myfile.csv',delimiter=',')
    

    gives an array:

    array([[ 1. ,  2. ,  3. ],
           [ 4. ,  5.5,  6. ]])
    

    and

    np.genfromtxt('myfile.csv',delimiter=',',dtype=None)
    

    gives a record array:

    array([(1.0, 2.0, 3), (4.0, 5.5, 6)], 
          dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])
    

    This has the advantage that file with multiple data types (including strings) can be easily imported.

    0 讨论(0)
  • 2020-11-22 03:57

    You can also try recfromcsv() which can guess data types and return a properly formatted record array.

    0 讨论(0)
  • 2020-11-22 03:58

    As I tried both ways using NumPy and Pandas, using pandas has a lot of advantages:

    • Faster
    • Less CPU usage
    • 1/3 RAM usage compared to NumPy genfromtxt

    This is my test code:

    $ for f in test_pandas.py test_numpy_csv.py ; do  /usr/bin/time python $f; done
    2.94user 0.41system 0:03.05elapsed 109%CPU (0avgtext+0avgdata 502068maxresident)k
    0inputs+24outputs (0major+107147minor)pagefaults 0swaps
    
    23.29user 0.72system 0:23.72elapsed 101%CPU (0avgtext+0avgdata 1680888maxresident)k
    0inputs+0outputs (0major+416145minor)pagefaults 0swaps
    

    test_numpy_csv.py

    from numpy import genfromtxt
    train = genfromtxt('/home/hvn/me/notebook/train.csv', delimiter=',')
    

    test_pandas.py

    from pandas import read_csv
    df = read_csv('/home/hvn/me/notebook/train.csv')
    

    Data file:

    du -h ~/me/notebook/train.csv
     59M    /home/hvn/me/notebook/train.csv
    

    With NumPy and pandas at versions:

    $ pip freeze | egrep -i 'pandas|numpy'
    numpy==1.13.3
    pandas==0.20.2
    
    0 讨论(0)
提交回复
热议问题