Adding new column to existing DataFrame in Python pandas

后端 未结 25 1275
你的背包
你的背包 2020-11-22 01:15

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d
2  0.671399  0.101208 -         


        
相关标签:
25条回答
  • 2020-11-22 01:41

    Foolproof:

    df.loc[:, 'NewCol'] = 'New_Val'
    

    Example:

    df = pd.DataFrame(data=np.random.randn(20, 4), columns=['A', 'B', 'C', 'D'])
    
    df
    
               A         B         C         D
    0  -0.761269  0.477348  1.170614  0.752714
    1   1.217250 -0.930860 -0.769324 -0.408642
    2  -0.619679 -1.227659 -0.259135  1.700294
    3  -0.147354  0.778707  0.479145  2.284143
    4  -0.529529  0.000571  0.913779  1.395894
    5   2.592400  0.637253  1.441096 -0.631468
    6   0.757178  0.240012 -0.553820  1.177202
    7  -0.986128 -1.313843  0.788589 -0.707836
    8   0.606985 -2.232903 -1.358107 -2.855494
    9  -0.692013  0.671866  1.179466 -1.180351
    10 -1.093707 -0.530600  0.182926 -1.296494
    11 -0.143273 -0.503199 -1.328728  0.610552
    12 -0.923110 -1.365890 -1.366202 -1.185999
    13 -2.026832  0.273593 -0.440426 -0.627423
    14 -0.054503 -0.788866 -0.228088 -0.404783
    15  0.955298 -1.430019  1.434071 -0.088215
    16 -0.227946  0.047462  0.373573 -0.111675
    17  1.627912  0.043611  1.743403 -0.012714
    18  0.693458  0.144327  0.329500 -0.655045
    19  0.104425  0.037412  0.450598 -0.923387
    
    
    df.drop([3, 5, 8, 10, 18], inplace=True)
    
    df
    
               A         B         C         D
    0  -0.761269  0.477348  1.170614  0.752714
    1   1.217250 -0.930860 -0.769324 -0.408642
    2  -0.619679 -1.227659 -0.259135  1.700294
    4  -0.529529  0.000571  0.913779  1.395894
    6   0.757178  0.240012 -0.553820  1.177202
    7  -0.986128 -1.313843  0.788589 -0.707836
    9  -0.692013  0.671866  1.179466 -1.180351
    11 -0.143273 -0.503199 -1.328728  0.610552
    12 -0.923110 -1.365890 -1.366202 -1.185999
    13 -2.026832  0.273593 -0.440426 -0.627423
    14 -0.054503 -0.788866 -0.228088 -0.404783
    15  0.955298 -1.430019  1.434071 -0.088215
    16 -0.227946  0.047462  0.373573 -0.111675
    17  1.627912  0.043611  1.743403 -0.012714
    19  0.104425  0.037412  0.450598 -0.923387
    
    df.loc[:, 'NewCol'] = 0
    
    df
               A         B         C         D  NewCol
    0  -0.761269  0.477348  1.170614  0.752714       0
    1   1.217250 -0.930860 -0.769324 -0.408642       0
    2  -0.619679 -1.227659 -0.259135  1.700294       0
    4  -0.529529  0.000571  0.913779  1.395894       0
    6   0.757178  0.240012 -0.553820  1.177202       0
    7  -0.986128 -1.313843  0.788589 -0.707836       0
    9  -0.692013  0.671866  1.179466 -1.180351       0
    11 -0.143273 -0.503199 -1.328728  0.610552       0
    12 -0.923110 -1.365890 -1.366202 -1.185999       0
    13 -2.026832  0.273593 -0.440426 -0.627423       0
    14 -0.054503 -0.788866 -0.228088 -0.404783       0
    15  0.955298 -1.430019  1.434071 -0.088215       0
    16 -0.227946  0.047462  0.373573 -0.111675       0
    17  1.627912  0.043611  1.743403 -0.012714       0
    19  0.104425  0.037412  0.450598 -0.923387       0
    
    0 讨论(0)
  • 2020-11-22 01:42

    This is the simple way of adding a new column: df['e'] = e

    0 讨论(0)
  • 2020-11-22 01:46

    Doing this directly via NumPy will be the most efficient:

    df1['e'] = np.random.randn(sLength)
    

    Note my original (very old) suggestion was to use map (which is much slower):

    df1['e'] = df1['a'].map(lambda x: np.random.random())
    
    0 讨论(0)
  • 2020-11-22 01:46

    I got the dreaded SettingWithCopyWarning, and it wasn't fixed by using the iloc syntax. My DataFrame was created by read_sql from an ODBC source. Using a suggestion by lowtech above, the following worked for me:

    df.insert(len(df.columns), 'e', pd.Series(np.random.randn(sLength),  index=df.index))
    

    This worked fine to insert the column at the end. I don't know if it is the most efficient, but I don't like warning messages. I think there is a better solution, but I can't find it, and I think it depends on some aspect of the index.
    Note. That this only works once and will give an error message if trying to overwrite and existing column.
    Note As above and from 0.16.0 assign is the best solution. See documentation http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html#pandas.DataFrame.assign Works well for data flow type where you don't overwrite your intermediate values.

    0 讨论(0)
  • 2020-11-22 01:46

    The following is what I did... But I'm pretty new to pandas and really Python in general, so no promises.

    df = pd.DataFrame([[1, 2], [3, 4], [5,6]], columns=list('AB'))
    
    newCol = [3,5,7]
    newName = 'C'
    
    values = np.insert(df.values,df.shape[1],newCol,axis=1)
    header = df.columns.values.tolist()
    header.append(newName)
    
    df = pd.DataFrame(values,columns=header)
    
    0 讨论(0)
  • 2020-11-22 01:47

    If you want to set the whole new column to an initial base value (e.g. None), you can do this: df1['e'] = None

    This actually would assign "object" type to the cell. So later you're free to put complex data types, like list, into individual cells.

    0 讨论(0)
提交回复
热议问题