Pandas: get the first occurrence grouping by keys

前端 未结 2 871
深忆病人
深忆病人 2021-01-12 12:45

If I have following dataframe

| id | timestamp           | code | id2
| 10 | 2017-07-12 13:37:00 | 206  | a1
| 10 | 2017-07-12 13:40:00 | 206  | a1
| 10 | 20         


        
相关标签:
2条回答
  • 2021-01-12 13:00

    I think you need GroupBy.first:

    df.groupby(["id", "id2"])["timestamp"].first()
    

    Or drop_duplicates:

    df.drop_duplicates(subset=['id','id2'])
    

    For same output:

    df1 = df.groupby(["id", "id2"], as_index=False)["timestamp"].first()
    print (df1)
       id id2            timestamp
    0  10  a1  2017-07-12 13:37:00
    1  10  a2  2017-07-12 19:00:00
    2  11  a1  2017-07-12 13:37:00
    
    df1 = df.drop_duplicates(subset=['id','id2'])[['id','id2','timestamp']]
    print (df1)
       id id2            timestamp
    0  10  a1  2017-07-12 13:37:00
    1  10  a2  2017-07-12 19:00:00
    2  11  a1  2017-07-12 13:37:00
    
    0 讨论(0)
  • 2021-01-12 13:15

    One can create a new column after merging id and id2 strings, then remove rows where it is duplicated:

    df['newcol'] = df.apply(lambda x: str(x.id) + str(x.id2), axis=1)
    df = df[~df.newcol.duplicated()].iloc[:,:4]   # iloc used to remove new column.
    print(df)
    

    Output:

       id              timestamp  code  id2
    0  10   2017-07-12 13:37:00    206   a1
    3  10   2017-07-12 19:00:00    206   a2
    4  11   2017-07-12 13:37:00    206   a1
    
    0 讨论(0)
提交回复
热议问题