Merge items on dataframes with duplicate values

后端 未结 2 1078
孤街浪徒
孤街浪徒 2021-01-19 03:02

So I have a dataframe (or series) where there are always 4 occurrences of each of column \'A\', like this:

df = pd.DataFrame([[\'foo\'],
                   [         


        
相关标签:
2条回答
  • 2021-01-19 03:23

    You'll need to create surrogate columns with groupby + cumcount to deduplicate your rows, then include those columns when calling merge:

    a = df.assign(D=df.groupby('A').cumcount())
    b = df_key.assign(D=df_key.groupby('A').cumcount())
    
    a.merge(b, on=['A', 'D'], how='left').drop('D', 1)
    
         A    B    C
    0  foo  1.0  2.0
    1  foo  3.0  4.0
    2  foo  NaN  NaN
    3  foo  NaN  NaN
    4  bar  5.0  9.0
    5  bar  2.0  4.0
    6  bar  1.0  9.0
    7  bar  NaN  NaN
    
    0 讨论(0)
  • 2021-01-19 03:29

    Or you can just repeat the column A of df_key the remaining number of times from df.

    s=df.A.value_counts()-df_key.A.value_counts()
    
    pd.concat([df_key,pd.DataFrame({'A':s.index.repeat(s)})]).sort_values('A')
    Out[469]: 
         A    B    C
    2  bar  5.0  9.0
    3  bar  2.0  4.0
    4  bar  1.0  9.0
    0  bar  NaN  NaN
    0  foo  1.0  2.0
    1  foo  3.0  4.0
    1  foo  NaN  NaN
    2  foo  NaN  NaN
    
    0 讨论(0)
提交回复
热议问题