Remap values in pandas column with a dict

后端 未结 10 1121
囚心锁ツ
囚心锁ツ 2020-11-21 05:14

I have a dictionary which looks like this: di = {1: \"A\", 2: \"B\"}

I would like to apply it to the \"col1\" column of a dataframe similar to:

10条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-21 05:47

    map can be much faster than replace

    If your dictionary has more than a couple of keys, using map can be much faster than replace. There are two versions of this approach, depending on whether your dictionary exhaustively maps all possible values (and also whether you want non-matches to keep their values or be converted to NaNs):

    Exhaustive Mapping

    In this case, the form is very simple:

    df['col1'].map(di)       # note: if the dictionary does not exhaustively map all
                             # entries then non-matched entries are changed to NaNs
    

    Although map most commonly takes a function as its argument, it can alternatively take a dictionary or series: Documentation for Pandas.series.map

    Non-Exhaustive Mapping

    If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna:

    df['col1'].map(di).fillna(df['col1'])
    

    as in @jpp's answer here: Replace values in a pandas series via dictionary efficiently

    Benchmarks

    Using the following data with pandas version 0.23.1:

    di = {1: "A", 2: "B", 3: "C", 4: "D", 5: "E", 6: "F", 7: "G", 8: "H" }
    df = pd.DataFrame({ 'col1': np.random.choice( range(1,9), 100000 ) })
    

    and testing with %timeit, it appears that map is approximately 10x faster than replace.

    Note that your speedup with map will vary with your data. The largest speedup appears to be with large dictionaries and exhaustive replaces. See @jpp answer (linked above) for more extensive benchmarks and discussion.

提交回复
热议问题