Adding new column to existing DataFrame in Python pandas

后端 未结 25 1236
你的背包
你的背包 2020-11-22 01:15

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d
2  0.671399  0.101208 -         


        
25条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-11-22 01:29

    Use the original df1 indexes to create the series:

    df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
    

    Edit 2015
    Some reported getting the SettingWithCopyWarning with this code.
    However, the code still runs perfectly with the current pandas version 0.16.1.

    >>> sLength = len(df1['a'])
    >>> df1
              a         b         c         d
    6 -0.269221 -0.026476  0.997517  1.294385
    8  0.917438  0.847941  0.034235 -0.448948
    
    >>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
    >>> df1
              a         b         c         d         e
    6 -0.269221 -0.026476  0.997517  1.294385  1.757167
    8  0.917438  0.847941  0.034235 -0.448948  2.228131
    
    >>> p.version.short_version
    '0.16.1'
    

    The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

    >>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
    >>> df1
              a         b         c         d         e         f
    6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
    8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
    >>> 
    

    In fact, this is currently the more efficient method as described in pandas docs


    Edit 2017

    As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

    df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)
    

提交回复
热议问题