Adding new column to existing DataFrame in Python pandas

后端 未结 25 1225
你的背包
你的背包 2020-11-22 01:15

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d
2  0.671399  0.101208 -         


        
相关标签:
25条回答
  • 2020-11-22 01:28

    It seems that in recent Pandas versions the way to go is to use df.assign:

    df1 = df1.assign(e=np.random.randn(sLength))

    It doesn't produce SettingWithCopyWarning.

    0 讨论(0)
  • 2020-11-22 01:29

    Use the original df1 indexes to create the series:

    df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
    

    Edit 2015
    Some reported getting the SettingWithCopyWarning with this code.
    However, the code still runs perfectly with the current pandas version 0.16.1.

    >>> sLength = len(df1['a'])
    >>> df1
              a         b         c         d
    6 -0.269221 -0.026476  0.997517  1.294385
    8  0.917438  0.847941  0.034235 -0.448948
    
    >>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
    >>> df1
              a         b         c         d         e
    6 -0.269221 -0.026476  0.997517  1.294385  1.757167
    8  0.917438  0.847941  0.034235 -0.448948  2.228131
    
    >>> p.version.short_version
    '0.16.1'
    

    The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

    >>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
    >>> df1
              a         b         c         d         e         f
    6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
    8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
    >>> 
    

    In fact, this is currently the more efficient method as described in pandas docs


    Edit 2017

    As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

    df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)
    
    0 讨论(0)
  • 2020-11-22 01:29

    Before assigning a new column, if you have indexed data, you need to sort the index. At least in my case I had to:

    data.set_index(['index_column'], inplace=True)
    "if index is unsorted, assignment of a new column will fail"        
    data.sort_index(inplace = True)
    data.loc['index_value1', 'column_y'] = np.random.randn(data.loc['index_value1', 'column_x'].shape[0])
    
    0 讨论(0)
  • 2020-11-22 01:30

    For the sake of completeness - yet another solution using DataFrame.eval() method:

    Data:

    In [44]: e
    Out[44]:
    0    1.225506
    1   -1.033944
    2   -0.498953
    3   -0.373332
    4    0.615030
    5   -0.622436
    dtype: float64
    
    In [45]: df1
    Out[45]:
              a         b         c         d
    0 -0.634222 -0.103264  0.745069  0.801288
    4  0.782387 -0.090279  0.757662 -0.602408
    5 -0.117456  2.124496  1.057301  0.765466
    7  0.767532  0.104304 -0.586850  1.051297
    8 -0.103272  0.958334  1.163092  1.182315
    9 -0.616254  0.296678 -0.112027  0.679112
    

    Solution:

    In [46]: df1.eval("e = @e.values", inplace=True)
    
    In [47]: df1
    Out[47]:
              a         b         c         d         e
    0 -0.634222 -0.103264  0.745069  0.801288  1.225506
    4  0.782387 -0.090279  0.757662 -0.602408 -1.033944
    5 -0.117456  2.124496  1.057301  0.765466 -0.498953
    7  0.767532  0.104304 -0.586850  1.051297 -0.373332
    8 -0.103272  0.958334  1.163092  1.182315  0.615030
    9 -0.616254  0.296678 -0.112027  0.679112 -0.622436
    
    0 讨论(0)
  • 2020-11-22 01:31

    If the column you are trying to add is a series variable then just :

    df["new_columns_name"]=series_variable_name #this will do it for you
    

    This works well even if you are replacing an existing column.just type the new_columns_name same as the column you want to replace.It will just overwrite the existing column data with the new series data.

    0 讨论(0)
  • 2020-11-22 01:31

    to insert a new column at a given location (0 <= loc <= amount of columns) in a data frame, just use Dataframe.insert:

    DataFrame.insert(loc, column, value)
    

    Therefore, if you want to add the column e at the end of a data frame called df, you can use:

    e = [-0.335485, -1.166658, -0.385571]    
    DataFrame.insert(loc=len(df.columns), column='e', value=e)
    

    value can be a Series, an integer (in which case all cells get filled with this one value), or an array-like structure

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html

    0 讨论(0)
提交回复
热议问题