I\'m trying to create a new column in a dataframe that contains the word count for the respective row. I\'m looking to the total number of words, not frequencies of each di
Here is a way using .apply()
:
df['number_of_words'] = df.col.apply(lambda x: len(x.split()))
example
Given this df
:
>>> df
col
0 This is one sentence
1 and another
After applying the .apply()
df['number_of_words'] = df.col.apply(lambda x: len(x.split()))
>>> df
col number_of_words
0 This is one sentence 4
1 and another 2
Note: As pointed out by in comments, and in this answer, .apply
is not necessarily the fastest method. If speed is important, better go with one of @cᴏʟᴅsᴘᴇᴇᴅ's methods.
This is one way using pd.Series.str.split and pd.Series.map:
df['word_count'] = df['col'].str.split().map(len)
The above assumes that df['col']
is a series of strings.
Example:
df = pd.DataFrame({'col': ['This is an example', 'This is another', 'A third']})
df['word_count'] = df['col'].str.split().map(len)
print(df)
# col word_count
# 0 This is an example 4
# 1 This is another 3
# 2 A third 2
With list
and map
data from cold
list(map(lambda x : len(x.split()),df.col))
Out[343]: [4, 3, 2]
str.split
+ str.len
str.len
works nicely for any non-numeric column.
df['totalwords'] = df['col'].str.split().str.len()
str.count
If your words are single-space separated, you may simply count the spaces plus 1.
df['totalwords'] = df['col'].str.count(' ') + 1
This is faster than you think!
df['totalwords'] = [len(x.split()) for x in df['col'].tolist()]