Word frequency count based on two words using python

南笙酒味 提交于 2019-12-25 04:32:16

问题


There are many resources online that shows how to do a word count for single word like this and this and this and others...
But I was not not able to find a concrete example for two words count frequency .

I have a csv file that has some strings in it.

FileList = "I love TV show makes me happy, I love also comedy show makes me feel like flying"

So I want the output to be like :

wordscount =  {"I love": 2, "show makes": 2, "makes me" : 2 }

Of course I will have to strip all the comma, interrogation points.... {!, , ", ', ?, ., (,), [, ], ^, %, #, @, &, *, -, _, ;, /, \, |, }

I will also remove some stop words which I found here just to get more concrete data from the text.

How can I achieve this results using python?

Thanks!


回答1:


>>> from collections import Counter
>>> import re
>>> 
>>> sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
>>> words = re.findall(r'\w+', sentence)
>>> two_words = [' '.join(ws) for ws in zip(words, words[1:])]
>>> wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1}
>>> wordscount
{'show makes': 2, 'makes me': 2, 'I love': 2}


来源:https://stackoverflow.com/questions/18952894/word-frequency-count-based-on-two-words-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!