Python pandas: replace values multiple columns matching multiple columns from another dataframe

后端 未结 2 1457
清歌不尽
清歌不尽 2020-12-31 20:42

I searched a lot for an answer, the closest question was Compare 2 columns of 2 different pandas dataframes, if the same insert 1 into the other in Python, but the answer to

相关标签:
2条回答
  • 2020-12-31 21:01

    Start by renaiming the columns you want to merge in df2

    df2.rename(columns={'OCHR':'chr','OSTOP':'pos'},inplace=True)
    

    Now merge on these columns

    df_merged = pd.merge(df1, df2, how='inner', on=['chr', 'pos']) # you might have to preserve the df1 index at this stage, not sure
    

    Next, you want to

    updater = df_merged[['D','CHR','STOP']] #this will be your update frame
    updater.rename( columns={'D':'snp','CHR':'chr','STOP':'pos'},inplace=True) # rename columns to update original
    

    Finally update (see bottom of this link):

    df1.update( df1_updater) #updates in place
    #  chr          snp  x    pos a1 a2
    #0   1  rs376643643  0  10040  G  A
    #1   1  rs373328635  0  10066  C  G
    #2   1   rs62651026  0  10208  C  G
    #3   1  rs376007522  0  10209  C  G
    #4   3  rs368469931  0  30247  C  T
    

    update works by matching index/column so you might have to string along the index of df1 for the entire process, then do df1_updater.re_index(... before df1.update(df1_updater)

    0 讨论(0)
  • 2020-12-31 21:02

    You can use the update function (requires setting the matching criteria to index). I've modified your sample data to allow some mismatch.

    # your data
    # =====================
    # df1 pos is modified from 10020 to 10010
    print(df1)
    
       chr      snp  x    pos a1 a2
    0    1  1-10020  0  10010  G  A
    1    1  1-10056  0  10056  C  G
    2    1  1-10108  0  10108  C  G
    3    1  1-10109  0  10109  C  G
    4    1  1-10139  0  10139  C  T
    
    print(df2)
    
                ID  CHR   STOP  OCHR  OSTOP
    0  rs376643643    1  10040     1  10020
    1  rs373328635    1  10066     1  10056
    2   rs62651026    1  10208     1  10108
    3  rs376007522    1  10209     1  10109
    4  rs368469931    3  30247     1  10139
    
    # processing
    # ==========================
    # set matching columns to multi-level index
    x1 = df1.set_index(['chr', 'pos'])['snp']
    x2 = df2.set_index(['OCHR', 'OSTOP'])['ID']
    # call update function, this is inplace
    x1.update(x2)
    # replace the values in original df1
    df1['snp'] = x1.values
    print(df1)
    
       chr          snp  x    pos a1 a2
    0    1      1-10020  0  10010  G  A
    1    1  rs373328635  0  10056  C  G
    2    1   rs62651026  0  10108  C  G
    3    1  rs376007522  0  10109  C  G
    4    1  rs368469931  0  10139  C  T
    
    0 讨论(0)
自定义标题
段落格式
字体
字号
代码语言
提交回复
热议问题