问题
I have a pandas dataframe.
df = pd.DataFrame(['Donald Dump','Make America Great Again!','Donald Shrimp'],
columns=['text'])
What I like to have is another column in Dataframe which has the length of the strings in the 'text' column.
For above example, it would be
text text_length
0 Donald Dump 11
1 Make America Great Again! 25
2 Donald Shrimp 13
I know I can loop through it and get the length but is there any way to vectorize this operation? I have few million rows.
回答1:
I think the easiest way is to use the apply
method of the DataFrame.
With this method you can manipulate the data any way you want.
You could do something like:
df['text_ength'] = df['text'].apply(len)
to create a new column with the data you want.
Edit After seeing @jezrael answer I was curious and decided to timeit. I created a DataFrame full with lorem ipsum sentences (101000 rows) and the difference is quite small. For me I got:
In [59]: %timeit df['text_length'] = (df.text.str.len())
10 loops, best of 3: 20.6 ms per loop
In [60]: %timeit df['text_length'] = df['text'].apply(len)
100 loops, best of 3: 17.6 ms per loop
回答2:
Use str.len:
print (df.text.str.len())
0 11
1 25
2 13
Name: text, dtype: int64
Sample:
import pandas as pd
df = pd.DataFrame(['Donald Dump','Make America Great Again!','Donald Shrimp'],
columns=['text'])
print (df)
text
0 Donald Dump
1 Make America Great Again!
2 Donald Shrimp
df['text_length'] = (df.text.str.len())
print (df)
text text_length
0 Donald Dump 11
1 Make America Great Again! 25
2 Donald Shrimp 13
来源:https://stackoverflow.com/questions/37687806/pandas-vectorized-operation-to-get-the-length-of-string