Python: wordcloud, repetitve words

前端 未结 2 855
说谎
说谎 2020-12-28 20:18

In the word cloud I have repetitive words and I do not understand why they are not counted together and are shown then as one word.

from wordcloud import Wor         


        
相关标签:
2条回答
  • 2020-12-28 20:39

    That is a feature called 'collocations' in the word_cloud project. You can turn it off by setting collocations=False, like this:

        wordcloud = WordCloud(collocations=False).generate(word_string)
    

    This will get rid of words that are frequently grouped together in your text. It will get rid of some things you probably don't like, for instance, "oh oh" and it will get rid of some others that you may like, for instance, "black culture"

    0 讨论(0)
  • 2020-12-28 20:42

    If you look at wordcloud.words_ you will see the frequency table includes some two-word phrases like 'oh oh', 'hook start', 'lets go', 'lets hook'.

    You would need to dig into the code behind .process_text() to see exactly why it does this.

    As a work-around you could split word_string yourself to build a word-frequency table, then use .generate_from_frequencies() to create the image.

    0 讨论(0)
提交回复
热议问题