Count number of words per row

后端 未结 4 1005
既然无缘
既然无缘 2020-11-29 08:32

I\'m trying to create a new column in a dataframe that contains the word count for the respective row. I\'m looking to the total number of words, not frequencies of each di

相关标签:
4条回答
  • 2020-11-29 09:03

    Here is a way using .apply():

    df['number_of_words'] = df.col.apply(lambda x: len(x.split()))
    

    example

    Given this df:

    >>> df
                        col
    0  This is one sentence
    1           and another
    

    After applying the .apply()

    df['number_of_words'] = df.col.apply(lambda x: len(x.split()))
    
    >>> df
                        col  number_of_words
    0  This is one sentence                4
    1           and another                2
    

    Note: As pointed out by in comments, and in this answer, .apply is not necessarily the fastest method. If speed is important, better go with one of @cᴏʟᴅsᴘᴇᴇᴅ's methods.

    0 讨论(0)
  • 2020-11-29 09:03

    This is one way using pd.Series.str.split and pd.Series.map:

    df['word_count'] = df['col'].str.split().map(len)
    

    The above assumes that df['col'] is a series of strings.

    Example:

    df = pd.DataFrame({'col': ['This is an example', 'This is another', 'A third']})
    
    df['word_count'] = df['col'].str.split().map(len)
    
    print(df)
    
    #                   col  word_count
    # 0  This is an example           4
    # 1     This is another           3
    # 2             A third           2
    
    0 讨论(0)
  • 2020-11-29 09:03

    With list and map data from cold

    list(map(lambda x : len(x.split()),df.col))
    Out[343]: [4, 3, 2]
    
    0 讨论(0)
  • 2020-11-29 09:07

    str.split + str.len

    str.len works nicely for any non-numeric column.

    df['totalwords'] = df['col'].str.split().str.len()
    

    str.count

    If your words are single-space separated, you may simply count the spaces plus 1.

    df['totalwords'] = df['col'].str.count(' ') + 1
    

    List Comprehension

    This is faster than you think!

    df['totalwords'] = [len(x.split()) for x in df['col'].tolist()]
    
    0 讨论(0)
提交回复
热议问题