Removing substring of from a list of strings

前端未结

关注

 4  1874

长发绾君心

There are several countries with numbers and/or parenthesis in my list. How I remove these?

e.g.

\'Bolivia (Plurinational State of)\' should be \'Bolivi

相关标签:

4条回答

孤独总比滥情好

2021-01-29 07:28

You can remove string by this way:-

Remove numbers:-

import re a = 'Switzerland17' pattern = '[0-9]' res = re.sub(pattern, '', a) print(res)

Output:-

'Switzerland'

Remove parenthesis:-

b = 'Bolivia (Plurinational State of)' pattern2 = '(\s*\(.*\))' res2 = re.sub(pattern2, '', b) print(res2)

Output:-

'Bolivia'

0 讨论(0)

发布评论:

提交评论

加载中...

独厮守ぢ

2021-01-29 07:40

Using Regex and simple List Operation

Go through the list items, find the regex matching in each item, and replace the values in place. This regex "[a-zA-Z]{2,}" works for only string matching with the minimum size of two or more. It gives your freedom based on parenthesis. The better approach for Regex is to use Matching string based on your input domain (i.e country in your case) and a Country name cannot have a number in its name or Parenthesis. SO you should use the following.

import re list_of_country_strings = ["Switzerland17", "America290","Korea(S)"] for index in range(len(list_of_country_strings)): x = re.match("[a-zA-Z]{2,}",string = list_of_country_strings[index]) if x: list_of_country_strings[index] = list_of_country_strings[index][x.start():x.end()] print(list_of_country_strings)

Output ['Switzerland', 'America', 'Korea']

0 讨论(0)

发布评论:

提交评论

加载中...

半阙折子戏

2021-01-29 07:47

Use Series.str.replace with regex for replacement, \s* is for possible spaces before (, then \(.*\) is for values () and values between | is for regex or and \d+ is for numbers with 1 or more digits:

df = pd.DataFrame({'a':['Bolivia (Plurinational State of)','Switzerland17']}) df['a'] = df['a'].str.replace('(\s*\(.*\)|\d+)','') print (df) a 0 Bolivia 1 Switzerland

0 讨论(0)

发布评论:

提交评论

加载中...

半阙折子戏

2021-01-29 07:48

Run just:

df.Country.replace(r'\d+|\s*\([^)]*\)', '', regex=True, inplace=True)

Assuming that the initial content of your DataFrame is:

Country 0 Bolivia (Plurinational State of) 1 Switzerland17 2 United Kingdom

after the above replace you will have:

Country 0 Bolivia 1 Switzerland 2 United Kingdom

The above pattern contains:

the first alternative - a non-empty sequence of digits,

the second alternative:

an optional sequence of "white" chars,

an opening parenthesis (quoted),

a sequence of chars other than ) (between brackets no quotation is needed),

a closing parenthesis (also quoted).

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复