I have a dataframe where I want to remove all parentheses and stuff inside it.
I checked out : How can I remove text within parentheses with a regex?
Where the answer to remove the data was
re.sub(r'\([^)]*\)', '', filename)
I tried this as well as
re.sub(r'\(.*?\)', '', filename)
However, I got an error: expected a string or buffer
When I tried using the column df['Column Name']
I got no item named 'Column Name'
I checked the dataframe using df.head()
and it showed up as a clean table with the column names as what I wanted them to be....however when I use the re
expression to remove the (stuff) it isn't recognizing the column name that I have.
I normally use
df['name'].str.replace(" ()","")
However, I want to remove the parentheses and what is inside....How can I do this using either regex or pandas?
Thanks!
Here is the solution I used...thanks for the help!
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*\)","")
df['name'].str.replace(r"\(.*\)","")
You can't run re functions directly on pandas objects. You have to loop them for each element inside the object. So Series.str.replace((r"\(.*\)", "")
is just syntactic sugar for Series.apply(lambda x: re.sub(r"\(.*\)", "", x))
.
If you have multiple (...)
substrings in the data you should consider using either
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*?\)","")
or
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\([^()]*\)","")
The difference is that .*?
is slower and does not match line breaks, and [^()]
matches any char but (
and )
and is quite efficient and matches line breaks. The first one will match (...(...)
but the second will only match (...)
.
If you want to normalize all whitespace after removing these substrings, you may consider
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\s*\([^()]*\)","").str.strip()
The \s*\([^()]*\)
regex will match 0+ whitespaces and then the string between parentheses and then str.stip()
will get rid of any potential trailing whitespace.
来源:https://stackoverflow.com/questions/20894525/how-to-remove-parentheses-and-all-data-within-using-pandas-python