Convert pandas dataframe to NumPy array

前端 未结 15 2324
别那么骄傲
别那么骄傲 2020-11-21 23:57

I am interested in knowing how to convert a pandas dataframe into a NumPy array.

dataframe:

import numpy as np
import pandas as pd

index = [1, 2, 3,         


        
相关标签:
15条回答
  • 2020-11-22 00:31

    A Simpler Way for Example DataFrame:

    df
    
             gbm       nnet        reg
    0  12.097439  12.047437  12.100953
    1  12.109811  12.070209  12.095288
    2  11.720734  11.622139  11.740523
    3  11.824557  11.926414  11.926527
    4  11.800868  11.727730  11.729737
    5  12.490984  12.502440  12.530894
    

    USE:

    np.array(df.to_records().view(type=np.matrix))
    

    GET:

    array([[(0, 12.097439  , 12.047437, 12.10095324),
            (1, 12.10981081, 12.070209, 12.09528824),
            (2, 11.72073428, 11.622139, 11.74052253),
            (3, 11.82455653, 11.926414, 11.92652727),
            (4, 11.80086775, 11.72773 , 11.72973699),
            (5, 12.49098389, 12.50244 , 12.53089367)]],
    dtype=(numpy.record, [('index', '<i8'), ('gbm', '<f8'), ('nnet', '<f4'),
           ('reg', '<f8')]))
    
    0 讨论(0)
  • 2020-11-22 00:34

    Try this:

    np.array(df) 
    
    array([['ID', nan, nan, nan],
       ['1', nan, 0.2, nan],
       ['2', nan, nan, 0.5],
       ['3', nan, 0.2, 0.5],
       ['4', 0.1, 0.2, nan],
       ['5', 0.1, 0.2, 0.5],
       ['6', 0.1, nan, 0.5],
       ['7', 0.1, nan, nan]], dtype=object)
    

    Some more information at: [https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html] Valid for numpy 1.16.5 and pandas 0.25.2.

    0 讨论(0)
  • 2020-11-22 00:37

    Note: The .as_matrix() method used in this answer is deprecated. Pandas 0.23.4 warns:

    Method .as_matrix will be removed in a future version. Use .values instead.


    Pandas has something built in...

    numpy_matrix = df.as_matrix()
    

    gives

    array([[nan, 0.2, nan],
           [nan, nan, 0.5],
           [nan, 0.2, 0.5],
           [0.1, 0.2, nan],
           [0.1, 0.2, 0.5],
           [0.1, nan, 0.5],
           [0.1, nan, nan]])
    
    0 讨论(0)
  • 2020-11-22 00:37

    It seems like df.to_records() will work for you. The exact feature you're looking for was requested and to_records pointed to as an alternative.

    I tried this out locally using your example, and that call yields something very similar to the output you were looking for:

    rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
           (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
           (7, 0.1, nan, nan)],
          dtype=[(u'ID', '<i8'), (u'A', '<f8'), (u'B', '<f8'), (u'C', '<f8')])
    

    Note that this is a recarray rather than an array. You could move the result in to regular numpy array by calling its constructor as np.array(df.to_records()).

    0 讨论(0)
  • 2020-11-22 00:38

    A simple way to convert dataframe to numpy array:

    import pandas as pd
    df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
    df_to_array = df.to_numpy()
    array([[1, 3],
       [2, 4]])
    

    Use of to_numpy is encouraged to preserve consistency.

    Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html

    0 讨论(0)
  • 2020-11-22 00:39

    You can use the to_records method, but have to play around a bit with the dtypes if they are not what you want from the get go. In my case, having copied your DF from a string, the index type is string (represented by an object dtype in pandas):

    In [102]: df
    Out[102]: 
    label    A    B    C
    ID                  
    1      NaN  0.2  NaN
    2      NaN  NaN  0.5
    3      NaN  0.2  0.5
    4      0.1  0.2  NaN
    5      0.1  0.2  0.5
    6      0.1  NaN  0.5
    7      0.1  NaN  NaN
    
    In [103]: df.index.dtype
    Out[103]: dtype('object')
    In [104]: df.to_records()
    Out[104]: 
    rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
           (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
           (7, 0.1, nan, nan)], 
          dtype=[('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
    In [106]: df.to_records().dtype
    Out[106]: dtype([('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
    

    Converting the recarray dtype does not work for me, but one can do this in Pandas already:

    In [109]: df.index = df.index.astype('i8')
    In [111]: df.to_records().view([('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
    Out[111]:
    rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
           (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
           (7, 0.1, nan, nan)], 
          dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
    

    Note that Pandas does not set the name of the index properly (to ID) in the exported record array (a bug?), so we profit from the type conversion to also correct for that.

    At the moment Pandas has only 8-byte integers, i8, and floats, f8 (see this issue).

    0 讨论(0)
提交回复
热议问题