Add column in dataframe from list

前端 未结 5 737
死守一世寂寞
死守一世寂寞 2020-12-04 07:16

I have a dataframe with some columns like this:

A   B   C  
0   
4
5
6
7
7
6
5

The possible range of values in A are only from 0 to 7

相关标签:
5条回答
  • 2020-12-04 07:57

    A solution improving on the great one from @sparrow.

    Let df, be your dataset, and mylist the list with the values you want to add to the dataframe.

    Let's suppose you want to call your new column simply, new_column

    First make the list into a Series:

    column_values = pd.Series(mylist)
    

    Then use the insert function to add the column. This function has the advantage to let you choose in which position you want to place the column. In the following example we will position the new column in the first position from left (by setting loc=0)

    df.insert(loc=0, column='new_column', value=column_values)
    
    0 讨论(0)
  • 2020-12-04 07:58

    Just assign the list directly:

    df['new_col'] = mylist
    

    Alternative
    Convert the list to a series or array and then assign:

    se = pd.Series(mylist)
    df['new_col'] = se.values
    

    or

    df['new_col'] = np.array(mylist)
    
    0 讨论(0)
  • 2020-12-04 08:01

    First let's create the dataframe you had, I'll ignore columns B and C as they are not relevant.

    df = pd.DataFrame({'A': [0, 4, 5, 6, 7, 7, 6,5]})
    

    And the mapping that you desire:

    mapping = dict(enumerate([2,5,6,8,12,16,26,32]))
    
    df['D'] = df['A'].map(mapping)
    

    Done!

    print df
    

    Output:

       A   D
    0  0   2
    1  4  12
    2  5  16
    3  6  26
    4  7  32
    5  7  32
    6  6  26
    7  5  16
    
    0 讨论(0)
  • 2020-12-04 08:12

    IIUC, if you make your (unfortunately named) List into an ndarray, you can simply index into it naturally.

    >>> import numpy as np
    >>> m = np.arange(16)*10
    >>> m[df.A]
    array([  0,  40,  50,  60, 150, 150, 140, 130])
    >>> df["D"] = m[df.A]
    >>> df
        A   B   C    D
    0   0 NaN NaN    0
    1   4 NaN NaN   40
    2   5 NaN NaN   50
    3   6 NaN NaN   60
    4  15 NaN NaN  150
    5  15 NaN NaN  150
    6  14 NaN NaN  140
    7  13 NaN NaN  130
    

    Here I built a new m, but if you use m = np.asarray(List), the same thing should work: the values in df.A will pick out the appropriate elements of m.


    Note that if you're using an old version of numpy, you might have to use m[df.A.values] instead-- in the past, numpy didn't play well with others, and some refactoring in pandas caused some headaches. Things have improved now.

    0 讨论(0)
  • 2020-12-04 08:17

    Old question; but I always try to use fastest code!

    I had a huge list with 69 millions of uint64. np.array() was fastest for me.

    df['hashes'] = hashes
    Time spent: 17.034842014312744
    
    df['hashes'] = pd.Series(hashes).values
    Time spent: 17.141014337539673
    
    df['key'] = np.array(hashes)
    Time spent: 10.724546194076538
    
    0 讨论(0)
提交回复
热议问题