Store numpy.array in cells of a Pandas.DataFrame

前端 未结 5 1881
小鲜肉
小鲜肉 2020-12-02 12:41

I have a dataframe in which I would like to store \'raw\' numpy.array:

df[\'COL_ARRAY\'] = df.apply(lambda r: np.array(do_something_with_r), axi         


        
相关标签:
5条回答
  • 2020-12-02 12:53

    Just wrap what you want to store in a cell to a list object through first apply, and extract it by index 0of that list through second apply:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'id': [1, 2, 3, 4],
                       'a': ['on', 'on', 'off', 'off'],
                       'b': ['on', 'off', 'on', 'off']})
    
    
    df['new'] = df.apply(lambda x: [np.array(x)], axis=1).apply(lambda x: x[0])
    
    df
    

    output:

        id  a       b       new
    0   1   on      on      [1, on, on]
    1   2   on      off     [2, on, off]
    2   3   off     on      [3, off, on]
    3   4   off     off     [4, off, off]
    
    0 讨论(0)
  • 2020-12-02 13:03

    Suppose you have a DataFrame ds and it has a column named as 'class'. If ds['class'] contains strings or numbers, and you want to change them with numpy.ndarrays or lists, the following code would help. In the code, class2vector is a numpy.ndarray or list and ds_class is a filter condition.

    ds['class'] = ds['class'].map(lambda x: class2vector if (isinstance(x, str) and (x == ds_class)) else x)

    0 讨论(0)
  • 2020-12-02 13:05

    Use a wrapper around the numpy array i.e. pass the numpy array as list

    a = np.array([5, 6, 7, 8])
    df = pd.DataFrame({"a": [a]})
    

    Output:

                 a
    0  [5, 6, 7, 8]
    

    Or you can use apply(np.array) by creating the tuples i.e. if you have a dataframe

    df = pd.DataFrame({'id': [1, 2, 3, 4],
                       'a': ['on', 'on', 'off', 'off'],
                       'b': ['on', 'off', 'on', 'off']})
    
    df['new'] = df.apply(lambda r: tuple(r), axis=1).apply(np.array)
    

    Output :

         a    b  id            new
    0   on   on   1    [on, on, 1]
    1   on  off   2   [on, off, 2]
    2  off   on   3   [off, on, 3]
    3  off  off   4  [off, off, 4]
    
    df['new'][0]
    

    Output :

    array(['on', 'on', '1'], dtype='<U2')
    
    0 讨论(0)
  • 2020-12-02 13:06

    If you first set a column to have type object, you can insert an array without any wrapping:

    df = pd.DataFrame(columns=[1])
    df[1] = df[1].astype(object)
    df.loc[1, 1] = np.array([5, 6, 7, 8])
    df
    

    Output:

        1
    1   [5, 6, 7, 8]
    
    0 讨论(0)
  • 2020-12-02 13:06

    You can wrap the Data Frame data args in square brackets to maintain the np.array in each cell:

    one_d_array = np.array([1,2,3])
    two_d_array = one_d_array*one_d_array[:,np.newaxis]
    two_d_array
    
    array([[1, 2, 3],
           [2, 4, 6],
           [3, 6, 9]])
    
    
    pd.DataFrame([
        [one_d_array],
        [two_d_array] ])
    
                                       0
    0                          [1, 2, 3]
    1  [[1, 2, 3], [2, 4, 6], [3, 6, 9]]
    
    0 讨论(0)
提交回复
热议问题