replace column values in one dataframe by values of another dataframe

前端 未结 5 1231
梦如初夏
梦如初夏 2020-11-27 16:49

I have two dataframes , the first one has 1000 rows and looks like:

Date            Group         Family       Bonus
2011-06-09      tri23_1       Laavin             


        
相关标签:
5条回答
  • 2020-11-27 17:29

    You could also create a dictionary and use apply:

    hotel_dict = df2.set_index('Group').to_dict()
    df1['Group'] = df1['Group'].apply(lambda x: hotel_dict[x])
    
    0 讨论(0)
  • 2020-11-27 17:46

    just use pandas join, you can refer to detail link: http://pandas.pydata.org/pandas-docs/stable/merging.html

    df1.join(df2,on='Group')
    
    0 讨论(0)
  • 2020-11-27 17:48

    This is an old question but here is another way to do it, it is not like the pandas way but is fast

    Reproducing the dataframe 1 - this is to be updated

    df_1
    
        Date    Group   Family  Bonus
    0   2011-06-09  tri23_1     Laavin  456
    1   2011-07-09  hsgç_T2     Grendy  679
    2   2011-09-10  bbbj-1Y_jn  Fantol  431
    3   2011-11-02  hsgç_T2     Gondow  569
    

    Reproducing dataframe 2 - the look up

    df_2
    
        Group   Hotel
    0   tri23_1     Jamel
    1   hsgç_T2     Frank
    2   bbbj-1Y_jn  Luxy
    3   mlkl_781    Grand Hotel
    4   vchs_94     Vancouver
    

    Get all the hotel id (key column) from the dataframe 1 as a list

    key_list = list(df_1['Group'])
    
    ['tri23_1', 'hsgç_T2', 'bbbj-1Y_jn', 'hsgç_T2']
    

    Create a dictionary from the look up dataframe which has the key col and the value col

    dict_lookup = dict(zip(df_2['Group'], df_2['Hotel']))
    
    {'bbbj-1Y_jn': 'Luxy',
     'hsgç_T2': 'Frank',
     'mlkl_781': 'Grand Hotel',
     'tri23_1': 'Jamel',
     'vchs_94': 'Vancouver'}
    

    Replace the value by creating a list by looking up the value and assign to dataframe 1 column

    df_1['Group'] = [dict_lookup[item] for item in key_list]
    

    Updated dataframe 1

        Date    Group   Family  Bonus
    0   2011-06-09  Jamel   Laavin  456
    1   2011-07-09  Frank   Grendy  679
    2   2011-09-10  Luxy    Fantol  431
    3   2011-11-02  Frank   Gondow  569
    
    0 讨论(0)
  • 2020-11-27 17:49

    If you set the index to the 'Group' column on the other df then you can replace using map on your original df 'Group' column:

    In [36]:
    df['Group'] = df['Group'].map(df1.set_index('Group')['Hotel'])
    df
    
    Out[36]:
             Date  Group  Family  Bonus
    0  2011-06-09  Jamel  Laavin    456
    1  2011-07-09  Frank  Grendy    679
    2  2011-09-10   Luxy  Fantol    431
    3  2011-11-02  Frank  Gondow    569
    
    0 讨论(0)
  • 2020-11-27 17:51

    Columns in pandas DataFrames are just Series. Make the DataFrames (or DataFrame and Series, as shown here) share the same index so that assignment can occur from the Series to the DataFrame:

    **In:**
    
    df = pd.DataFrame(data=
    {'date': ['2011-06-09', '2011-07-09', '2011-09-10', '2011-11-02'], 
    'family': ['Laavin', 'Grendy', 'Fantol', 'Gondow'], 
    'bonus': ['456', '679', '431', '569']}, 
    index=pd.Index(name='Group', data=['tri23_1', 'hsgç_T2', 'bbbj-1Y_jn', 'hsgç_T2']))
    
    **Out:**
                date    family  bonus
    Group           
    tri23_1 2011-06-09  Laavin  456
    hsgç_T2 2011-07-09  Grendy  679
    bbbj-1Y_jn  2011-09-10  Fantol  431
    hsgç_T2 2011-11-02  Gondow  569
    
    **In:**
    
    hotel_groups = pd.Series(['Jamel', 'Frank', 'Luxy', 'Grand Hotel', 'Vancouver'], 
    index=pd.Index(name='Group', data=['tri23_1', 'hsgç_T2', 'bbbj-1Y_jn', 'mlkl_781', 'vchs_94']))
    
    **Out:**
    
    Group
    tri23_1             Jamel
    hsgç_T2             Frank
    bbbj-1Y_jn           Luxy
    mlkl_781      Grand Hotel
    vchs_94         Vancouver
    dtype: object
    
    **In:**
    
    df['hotel'] = hotel_groups
    
    **Out:**
    
                date    family  bonus hotel
    Group               
    tri23_1 2011-06-09  Laavin  456 Jamel
    hsgç_T2 2011-07-09  Grendy  679 Frank
    bbbj-1Y_jn  2011-09-10  Fantol  431 Luxy
    hsgç_T2 2011-11-02  Gondow  569 Frank
    

    Notice that the index of both is 'Group', which allows the assignment.

    If you assign a like-indexed Series to a DataFrame column, the assignment works. Notice that this works despite there being duplicate group values in df. It would not work if there were duplicate index values (with different corresponding data values) in the hotel_groups Series (e.g., if there were two entries for index value hsgc_T2, the first with data value Frank and the second with data value Luxy that is being assigned to df['hotel'] (not that this would ever occur in your example). This wouldn't work because there wouldn't be a way to know which value to assign the like-indexed DataFrame column.

    0 讨论(0)
提交回复
热议问题