问题
I have the following dataframe:
df = pd.DataFrame(["Air type:1, Space kind:2, water", "something, Space blu:3, somethingelse"], columns = ['A'])
and I want to create a new column that contains for each row all the elements that have a ":" in them. So for example in the first row I want to return "type:1, kind:2" and for the second row I want "blu:3". I managed by using a list comprehension in the following way:
df['new'] = [[y for y in x if ":" in y] for x in df['A'].str.split(",")]
But my issue is that the new column contains list elements.
A new
0 Air type:1, Space kind:2, water [Air type:1, Space kind:2]
1 something at the start:4, Space blu:3, somethingelse [something at the start:4, Space blu:3]
I have not used Python a lot so I am not 100% whether I am missing a more Pandas specific way to do this. If there is one, more than happy to learn about it and use it. If this is a correct approach how can I convert the elements back into strings in order to do regexes on them? I tried How to concatenate items in a list to a single string? but this is not working as I would like it to.
回答1:
You can use pd.Series.str.findall here.
df['new'] = df['A'].str.findall('\w+:\w+')
A new
0 type:1, kind:2, water [type:1, kind:2]
1 something, blu:3, somethingelse [blu:3]
EDIT:
When there are multiple words then try
df['new'] = df['A'].str.findall('[^\s,][^:,]+:[^:,]+').str.join(', ')
A new
0 Air type:1, Space kind:2, water Air type:1, Space kind:2
1 something, Space blu:3, somethingelse Space blu:3
回答2:
You can use findall
with join
:
import pandas as pd
df = pd.DataFrame(["type:1, kind:2, water", "something, blu:3, somethingelse"], columns = ['A'])
df['new'] = df['A'].str.findall(r'[^\s:,]+:[^\s,]+').str.join(', ')
df['new']
# => 0 type:1, kind:2
# => 1 blu:3
The regex matches
[^\s:,]+
- one or more chars other than whitespace,:
and,
:
- a colon[^\s,]+
- one or more chars other than whitespace and,
.
See the regex demo.
The .str.join(', ')
concats all the found matches with ,
+space.
来源:https://stackoverflow.com/questions/64958329/keep-elements-with-pattern-in-pandas-series-without-converting-them-to-list