Pandas update multiple columns at once

后端 未结 2 639
青春惊慌失措
青春惊慌失措 2021-02-05 04:55

I\'m trying to update a couple fields at once - I have two data sources and I\'m trying to reconcile them. I know I could do some ugly merging and then delete columns, but was

相关标签:
2条回答
  • 2021-02-05 05:08

    In the "take the hill" spirit, I offer the below solution which yields the requested result.

    I realize this is not exactly what you are after as I am not slicing the df (in the reasonable - but non functional - way in which you propose).

    #Does not work when indexing on np.nan, so I fill with some arbitrary value. 
    df = df.fillna('AAA')
    
    #mask to determine which rows to update
    mask = df['Col1'] == 'AAA'
    
    #dict with key value pairs for columns to be updated
    mp = {'Col1':'col1_v2','Col2':'col2_v2','Col3':'col3_v2'}
    
    #update
    for k in mp: 
         df.loc[mask,k] = df[mp.get(k)]
    
    #swap back np.nans for the arbitrary values
    df = df.replace('AAA',np.nan)
    

    Output:

    Col1    Col2    Col3    col1_v2     col2_v2     col3_v2
    A       B       C       NaN         NaN         NaN
    D       E       F       NaN         NaN         NaN
    a       b       d       a           b           d
    d       e       f       d           e           f
    

    The error I get if I do not replace nans is below. I'm going to research exactly where that error stems from.

    ValueError: array is not broadcastable to correct shape
    
    0 讨论(0)
  • 2021-02-05 05:14

    you want to replace

    print df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']]
    
      Col1 Col2 Col3
    2  NaN  NaN  NaN
    3  NaN  NaN  NaN
    

    With:

    replace_with_this = df.loc[df['Col1'].isnull(),['col1_v2','col2_v2', 'col3_v2']]
    print replace_with_this
    
      col1_v2 col2_v2 col3_v2
    2       a       b       d
    3       d       e       f
    

    Seems reasonable. However, when you do the assignment, you need to account for index alignment, which includes columns.

    So, this should work:

    df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = replace_with_this.values
    
    print df
    
      Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
    0    A    B    C     NaN     NaN     NaN
    1    D    E    F     NaN     NaN     NaN
    2    a    b    d       a       b       d
    3    d    e    f       d       e       f
    

    I accounted for columns by using .values at the end. This stripped the column information from the replace_with_this dataframe and just used the values in the appropriate positions.

    0 讨论(0)
提交回复
热议问题