How to remove parentheses and all data within using Pandas/Python?

这一生的挚爱 提交于 2019-11-27 07:27:10
df['name'].str.replace(r"\(.*\)","")

You can't run re functions directly on pandas objects. You have to loop them for each element inside the object. So Series.str.replace((r"\(.*\)", "") is just syntactic sugar for Series.apply(lambda x: re.sub(r"\(.*\)", "", x)).

If you have multiple (...) substrings in the data you should consider using either

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*?\)","")

or

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\([^()]*\)","")

The difference is that .*? is slower and does not match line breaks, and [^()] matches any char but ( and ) and is quite efficient and matches line breaks. The first one will match (...(...) but the second will only match (...).

If you want to normalize all whitespace after removing these substrings, you may consider

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\s*\([^()]*\)","").str.strip()

The \s*\([^()]*\) regex will match 0+ whitespaces and then the string between parentheses and then str.stip() will get rid of any potential trailing whitespace.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!