Adding new column to existing DataFrame in Python pandas

后端 未结 25 1276
你的背包
你的背包 2020-11-22 01:15

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d
2  0.671399  0.101208 -         


        
相关标签:
25条回答
  • 2020-11-22 01:32

    I would like to add a new column, 'e', to the existing data frame and do not change anything in the data frame. (The series always got the same length as a dataframe.)

    I assume that the index values in e match those in df1.

    The easiest way to initiate a new column named e, and assign it the values from your series e:

    df['e'] = e.values
    

    assign (Pandas 0.16.0+)

    As of Pandas 0.16.0, you can also use assign, which assigns new columns to a DataFrame and returns a new object (a copy) with all the original columns in addition to the new ones.

    df1 = df1.assign(e=e.values)
    

    As per this example (which also includes the source code of the assign function), you can also include more than one column:

    df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
    >>> df.assign(mean_a=df.a.mean(), mean_b=df.b.mean())
       a  b  mean_a  mean_b
    0  1  3     1.5     3.5
    1  2  4     1.5     3.5
    

    In context with your example:

    np.random.seed(0)
    df1 = pd.DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd'])
    mask = df1.applymap(lambda x: x <-0.7)
    df1 = df1[-mask.any(axis=1)]
    sLength = len(df1['a'])
    e = pd.Series(np.random.randn(sLength))
    
    >>> df1
              a         b         c         d
    0  1.764052  0.400157  0.978738  2.240893
    2 -0.103219  0.410599  0.144044  1.454274
    3  0.761038  0.121675  0.443863  0.333674
    7  1.532779  1.469359  0.154947  0.378163
    9  1.230291  1.202380 -0.387327 -0.302303
    
    >>> e
    0   -1.048553
    1   -1.420018
    2   -1.706270
    3    1.950775
    4   -0.509652
    dtype: float64
    
    df1 = df1.assign(e=e.values)
    
    >>> df1
              a         b         c         d         e
    0  1.764052  0.400157  0.978738  2.240893 -1.048553
    2 -0.103219  0.410599  0.144044  1.454274 -1.420018
    3  0.761038  0.121675  0.443863  0.333674 -1.706270
    7  1.532779  1.469359  0.154947  0.378163  1.950775
    9  1.230291  1.202380 -0.387327 -0.302303 -0.509652
    

    The description of this new feature when it was first introduced can be found here.

    0 讨论(0)
  • 2020-11-22 01:36
    1. First create a python's list_of_e that has relevant data.
    2. Use this: df['e'] = list_of_e
    0 讨论(0)
  • 2020-11-22 01:36

    Let me just add that, just like for hum3, .loc didn't solve the SettingWithCopyWarning and I had to resort to df.insert(). In my case false positive was generated by "fake" chain indexing dict['a']['e'], where 'e' is the new column, and dict['a'] is a DataFrame coming from dictionary.

    Also note that if you know what you are doing, you can switch of the warning using pd.options.mode.chained_assignment = None and than use one of the other solutions given here.

    0 讨论(0)
  • 2020-11-22 01:37
    x=pd.DataFrame([1,2,3,4,5])
    
    y=pd.DataFrame([5,4,3,2,1])
    
    z=pd.concat([x,y],axis=1)
    

    0 讨论(0)
  • 2020-11-22 01:40

    One thing to note, though, is that if you do

    df1['e'] = Series(np.random.randn(sLength), index=df1.index)
    

    this will effectively be a left join on the df1.index. So if you want to have an outer join effect, my probably imperfect solution is to create a dataframe with index values covering the universe of your data, and then use the code above. For example,

    data = pd.DataFrame(index=all_possible_values)
    df1['e'] = Series(np.random.randn(sLength), index=df1.index)
    
    0 讨论(0)
  • 2020-11-22 01:41

    Easiest ways:-

    data['new_col'] = list_of_values
    
    data.loc[ : , 'new_col'] = list_of_values
    

    This way you avoid what is called chained indexing when setting new values in a pandas object. Click here to read further.

    0 讨论(0)
提交回复
热议问题