I have a dictionary which looks like this: di = {1: \"A\", 2: \"B\"}
I would like to apply it to the \"col1\" column of a dataframe similar to:
map
can be much faster than replace
If your dictionary has more than a couple of keys, using map
can be much faster than replace
. There are two versions of this approach, depending on whether your dictionary exhaustively maps all possible values (and also whether you want non-matches to keep their values or be converted to NaNs):
In this case, the form is very simple:
df['col1'].map(di) # note: if the dictionary does not exhaustively map all
# entries then non-matched entries are changed to NaNs
Although map
most commonly takes a function as its argument, it can alternatively take a dictionary or series: Documentation for Pandas.series.map
If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna
:
df['col1'].map(di).fillna(df['col1'])
as in @jpp's answer here: Replace values in a pandas series via dictionary efficiently
Using the following data with pandas version 0.23.1:
di = {1: "A", 2: "B", 3: "C", 4: "D", 5: "E", 6: "F", 7: "G", 8: "H" }
df = pd.DataFrame({ 'col1': np.random.choice( range(1,9), 100000 ) })
and testing with %timeit
, it appears that map
is approximately 10x faster than replace
.
Note that your speedup with map
will vary with your data. The largest speedup appears to be with large dictionaries and exhaustive replaces. See @jpp answer (linked above) for more extensive benchmarks and discussion.