问题
All,
To replace one string in one particular column I have done this and it worked fine:
dataUS['sec_type'].str.strip().str.replace("LOCAL","CORP")
I would like now to replace multiple strings with one string say replace ["LOCAL", "FOREIGN", "HELLO"] with "CORP"
How can make it work? the code below didnt work
dataUS['sec_type'].str.strip().str.replace(["LOCAL", "FOREIGN", "HELLO"], "CORP")
回答1:
You can perform this task by forming a |-separated string. This works because pd.Series.str.replace accepts regex:
Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().
This avoids the need to create a dictionary.
import pandas as pd
df = pd.DataFrame({'A': ['LOCAL TEST', 'TEST FOREIGN', 'ANOTHER HELLO', 'NOTHING']})
pattern = '|'.join(['LOCAL', 'FOREIGN', 'HELLO'])
df['A'] = df['A'].str.replace(pattern, 'CORP')
# A
# 0 CORP TEST
# 1 TEST CORP
# 2 ANOTHER CORP
# 3 NOTHING
回答2:
replace
can accept dict
, os we just create a dict for those values need to be replaced
dataUS['sec_type'].str.strip().replace(dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3)),regex=True)
Info of the dict
dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3))
Out[585]: {'FOREIGN': 'CORP', 'HELLO': 'CORP', 'LOCAL': 'CORP'}
The reason why you receive the error ,
str.replace is different from replace
回答3:
Try:
dataUS.replace({"sec_type": { 'LOCAL' : "CORP", 'FOREIGN' : "CORP"}})
回答4:
Function to replace multiple values in pandas Series:
def replace_values(series, to_replace, value):
for i in to_replace:
series = series.str.replace(i, value)
return series
Hope this helps someone
回答5:
The answer of @Rakesh is very neat but does not allow for substrings. With a small change however, it does.
- Use a replacement dictionary because it makes it much more generic
- Add the keyword argument
regex=True
toSeries.replace()
(notSeries.str.replace
) This does two things actually: It changes your replacement to regex replacement, which is much more powerful but you will have to escape special characters. Beware for that. Secondly it will make the replace work on substrings instead of the entire string. Which is really cool!
replacement = {
"LOCAL": "CORP",
"FOREIGN": "CORP",
"HELLO": "CORP"
}
dataUS['sec_type'].replace(replacement, regex=True)
Full code example
dataUS = pd.DataFrame({'sec_type': ['LOCAL', 'Sample text LOCAL', 'Sample text LOCAL sample FOREIGN']})
replacement = {
"LOCAL": "CORP",
"FOREIGN": "CORP",
"HELLO": "CORP"
}
dataUS['sec_type'].replace(replacement, regex=True)
Output
0 CORP
1 CORP
2 Sample text CORP
3 Sample text CORP sample CORP
Name: sec_type, dtype: object
来源:https://stackoverflow.com/questions/49413005/replace-multiple-substrings-in-a-pandas-series-with-a-value