pandas merge dataframe with NaN (or “unknown”) for missing values

前端 未结 4 846
梦如初夏
梦如初夏 2021-02-05 01:27

I have 2 dataframes, one of which has supplemental information for some (but not all) of the rows in the other.

names = df({\'names\':[\'bob\',\'frank\',\'james\         


        
4条回答
  •  时光取名叫无心
    2021-02-05 02:07

    Think of it as an SQL join operation. You need a left-outer join[1].

    names = pd.DataFrame({'names':['bob','frank','james','tim','ricardo','mike','mark','joan','joe'],'position':['dev','dev','dev','sys','sys','sys','sup','sup','sup']})

    info = pd.DataFrame({'names':['joe','mark','tim','frank'],'classification':['thief','thief','good','thief']})

    Since there are names for which there is no classification, a left-outer join will do the job.

    a = pd.merge(names, info, how='left', on='names')

    The result is ...

    >>> a
         names position classification
    0      bob      dev            NaN
    1    frank      dev          thief
    2    james      dev            NaN
    3      tim      sys           good
    4  ricardo      sys            NaN
    5     mike      sys            NaN
    6     mark      sup          thief
    7     joan      sup            NaN
    8      joe      sup          thief
    

    ... which is fine. All the NaN results are ok if you look at both the tables.

    Cheers!

    [1] - http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging

提交回复
热议问题