问题
I have the following text in column A:
A
hellothere_3.43
hellothere_3.9
I would like to extract only the numbers to another new column B (next to A), e.g:
B
3.43
3.9
I use: str.extract('(\d.\d\d)', expand=True)
but this copies only the 3.43 (i.e. the exact number of digits). Is there a way to make it more generic?
Many thanks!
回答1:
Use Regex.
Ex:
import pandas as pd
df = pd.DataFrame({"A": ["hellothere_3.43", "hellothere_3.9"]})
df["B"] = df["A"].str.extract("(\d*\.?\d+)", expand=True)
print(df)
Output:
A B
0 hellothere_3.43 3.43
1 hellothere_3.9 3.9
回答2:
I think string split and apply lambda is quite clean.
import pandas as pd
df = pd.DataFrame({"A": ["hellothere_3.43", "hellothere_3.9"]})
df["B"] = df['A'].str.split('_').apply(lambda x: float(x[1]))
I haven't done any proper comparison, but it seems faster than the regex-solution on small tests.
来源:https://stackoverflow.com/questions/50830059/python-pandas-extracting-numbers-within-text-to-a-new-column