Remap values in pandas column with a dict

后端未结

关注

 10  1121

囚心锁ツ 2020-11-21 05:14

I have a dictionary which looks like this: di = {1: \"A\", 2: \"B\"}

I would like to apply it to the \"col1\" column of a dataframe similar to:

10条回答

暗喜 (楼主)

2020-11-21 05:47
map can be much faster than replace

If your dictionary has more than a couple of keys, using map can be much faster than replace. There are two versions of this approach, depending on whether your dictionary exhaustively maps all possible values (and also whether you want non-matches to keep their values or be converted to NaNs):

Exhaustive Mapping

In this case, the form is very simple:
```
df['col1'].map(di)       # note: if the dictionary does not exhaustively map all
                         # entries then non-matched entries are changed to NaNs
```
Although map most commonly takes a function as its argument, it can alternatively take a dictionary or series: Documentation for Pandas.series.map

Non-Exhaustive Mapping

If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna:
```
df['col1'].map(di).fillna(df['col1'])
```
as in @jpp's answer here: Replace values in a pandas series via dictionary efficiently

Benchmarks

Using the following data with pandas version 0.23.1:
```
di = {1: "A", 2: "B", 3: "C", 4: "D", 5: "E", 6: "F", 7: "G", 8: "H" }
df = pd.DataFrame({ 'col1': np.random.choice( range(1,9), 100000 ) })
```
and testing with %timeit, it appears that map is approximately 10x faster than replace.

Note that your speedup with map will vary with your data. The largest speedup appears to be with large dictionaries and exhaustive replaces. See @jpp answer (linked above) for more extensive benchmarks and discussion.
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...

Remap values in pandas column with a dict

map can be much faster than replace

Exhaustive Mapping

Non-Exhaustive Mapping

Benchmarks

`map` can be much faster than `replace`