How to use the split function on every row in a dataframe in Python?

前端 未结 4 1569
一向
一向 2021-02-04 05:32

I want to count the number of times a word is being repeated in the review string

I am reading the csv file and storing it in a python dataframe using the below line

相关标签:
4条回答
  • 2021-02-04 06:00

    You can use .str to use string methods on series of strings:

    reviews["review"].str.split("disappointed")
    
    0 讨论(0)
  • 2021-02-04 06:08

    Well, the problem is with:

    reviews["review"]
    

    The above is a Series. In your first snippet, you are doing this:

    reviews["review"][1].split("disappointed")
    

    That is, you are putting an index for the review. You could try looping over all rows of the column and perform your desired action. For example:

    for index, row in reviews.iterrows():
        print len(row['review'].split("disappointed"))
    
        
    
    0 讨论(0)
  • 2021-02-04 06:17

    pandas 0.20.3 has pandas.Series.str.split() which acts on every string of the series and does the split. So you can simply split and then count the number of splits made

    len(reviews['review'].str.split('disappointed')) - 1
    

    pandas.Series.str.split

    0 讨论(0)
  • 2021-02-04 06:21

    You're trying to split the entire review column of the data frame (which is the Series mentioned in the error message). What you want to do is apply a function to each row of the data frame, which you can do by calling apply on the data frame:

    f = lambda x: len(x["review"].split("disappointed")) -1
    reviews["disappointed"] = reviews.apply(f, axis=1)
    
    0 讨论(0)
提交回复
热议问题