Replace multiple substrings in a Pandas series with a value

蓝咒 提交于 2020-06-10 05:56:49

问题


All,

To replace one string in one particular column I have done this and it worked fine:

dataUS['sec_type'].str.strip().str.replace("LOCAL","CORP")

I would like now to replace multiple strings with one string say replace ["LOCAL", "FOREIGN", "HELLO"] with "CORP"

How can make it work? the code below didnt work

dataUS['sec_type'].str.strip().str.replace(["LOCAL", "FOREIGN", "HELLO"], "CORP")

回答1:


You can perform this task by forming a |-separated string. This works because pd.Series.str.replace accepts regex:

Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().

This avoids the need to create a dictionary.

import pandas as pd

df = pd.DataFrame({'A': ['LOCAL TEST', 'TEST FOREIGN', 'ANOTHER HELLO', 'NOTHING']})

pattern = '|'.join(['LOCAL', 'FOREIGN', 'HELLO'])

df['A'] = df['A'].str.replace(pattern, 'CORP')

#               A
# 0     CORP TEST
# 1     TEST CORP
# 2  ANOTHER CORP
# 3       NOTHING



回答2:


replace can accept dict , os we just create a dict for those values need to be replaced

dataUS['sec_type'].str.strip().replace(dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3)),regex=True)

Info of the dict

dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3))
Out[585]: {'FOREIGN': 'CORP', 'HELLO': 'CORP', 'LOCAL': 'CORP'}

The reason why you receive the error ,

str.replace is different from replace




回答3:


Try:

dataUS.replace({"sec_type": { 'LOCAL' : "CORP", 'FOREIGN' : "CORP"}})



回答4:


Function to replace multiple values in pandas Series:

def replace_values(series, to_replace, value): for i in to_replace: series = series.str.replace(i, value) return series

Hope this helps someone




回答5:


The answer of @Rakesh is very neat but does not allow for substrings. With a small change however, it does.

  1. Use a replacement dictionary because it makes it much more generic
  2. Add the keyword argument regex=True to Series.replace() (not Series.str.replace) This does two things actually: It changes your replacement to regex replacement, which is much more powerful but you will have to escape special characters. Beware for that. Secondly it will make the replace work on substrings instead of the entire string. Which is really cool!
replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Full code example

dataUS = pd.DataFrame({'sec_type': ['LOCAL', 'Sample text LOCAL', 'Sample text LOCAL sample FOREIGN']})

replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Output

0                            CORP
1                            CORP
2                Sample text CORP
3    Sample text CORP sample CORP
Name: sec_type, dtype: object


来源:https://stackoverflow.com/questions/49413005/replace-multiple-substrings-in-a-pandas-series-with-a-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!