Remap values in pandas column with a dict

后端 未结 10 1102
囚心锁ツ
囚心锁ツ 2020-11-21 05:14

I have a dictionary which looks like this: di = {1: \"A\", 2: \"B\"}

I would like to apply it to the \"col1\" column of a dataframe similar to:

10条回答
  •  伪装坚强ぢ
    2020-11-21 05:31

    There is a bit of ambiguity in your question. There are at least three two interpretations:

    1. the keys in di refer to index values
    2. the keys in di refer to df['col1'] values
    3. the keys in di refer to index locations (not the OP's question, but thrown in for fun.)

    Below is a solution for each case.


    Case 1: If the keys of di are meant to refer to index values, then you could use the update method:

    df['col1'].update(pd.Series(di))
    

    For example,

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'col1':['w', 10, 20],
                       'col2': ['a', 30, np.nan]},
                      index=[1,2,0])
    #   col1 col2
    # 1    w    a
    # 2   10   30
    # 0   20  NaN
    
    di = {0: "A", 2: "B"}
    
    # The value at the 0-index is mapped to 'A', the value at the 2-index is mapped to 'B'
    df['col1'].update(pd.Series(di))
    print(df)
    

    yields

      col1 col2
    1    w    a
    2    B   30
    0    A  NaN
    

    I've modified the values from your original post so it is clearer what update is doing. Note how the keys in di are associated with index values. The order of the index values -- that is, the index locations -- does not matter.


    Case 2: If the keys in di refer to df['col1'] values, then @DanAllan and @DSM show how to achieve this with replace:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'col1':['w', 10, 20],
                       'col2': ['a', 30, np.nan]},
                      index=[1,2,0])
    print(df)
    #   col1 col2
    # 1    w    a
    # 2   10   30
    # 0   20  NaN
    
    di = {10: "A", 20: "B"}
    
    # The values 10 and 20 are replaced by 'A' and 'B'
    df['col1'].replace(di, inplace=True)
    print(df)
    

    yields

      col1 col2
    1    w    a
    2    A   30
    0    B  NaN
    

    Note how in this case the keys in di were changed to match values in df['col1'].


    Case 3: If the keys in di refer to index locations, then you could use

    df['col1'].put(di.keys(), di.values())
    

    since

    df = pd.DataFrame({'col1':['w', 10, 20],
                       'col2': ['a', 30, np.nan]},
                      index=[1,2,0])
    di = {0: "A", 2: "B"}
    
    # The values at the 0 and 2 index locations are replaced by 'A' and 'B'
    df['col1'].put(di.keys(), di.values())
    print(df)
    

    yields

      col1 col2
    1    A    a
    2   10   30
    0    B  NaN
    

    Here, the first and third rows were altered, because the keys in di are 0 and 2, which with Python's 0-based indexing refer to the first and third locations.

提交回复
热议问题