Creating wordcloud using python

孤人 提交于 2019-12-02 10:12:51

问题


I am trying to create a wordcloud in python after cleaning text file ,

I got the required results i.e words which are mostly used in the text file but unable to plot.

My code:

import collections
from wordcloud import WordCloud
import matplotlib.pyplot as plt

file = open('example.txt', encoding = 'utf8' )
stopwords = set(line.strip() for line in open('stopwords'))
wordcount = {}

for word in file.read().split():
    word = word.lower()
    word = word.replace(".","")
    word = word.replace(",","")
    word = word.replace("\"","")
    word = word.replace("“","")
    if word not in stopwords:
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1

d = collections.Counter(wordcount)
for word, count in d.most_common(10):
    print(word , ":", count)

#wordcloud = WordCloud().generate(text)
#fig = plt.figure()
#fig.set_figwidth(14)
#fig.set_figheight(18)

#plt.imshow(wordcloud.recolor(color_func=grey_color, random_state=3))
#plt.title(title, color=fontcolor, size=30, y=1.01)
#plt.annotate(footer, xy=(0, -.025), xycoords='axes fraction', fontsize=infosize, color=fontcolor)
#plt.axis('off')
#plt.show()

Edit: Plotted the wordcloud with following code:

wordcloud = WordCloud(background_color='white',
                          width=1200,
                          height=1000
                         ).generate((d.most_common(10)))


plt.imshow(wordcloud)
plt.axis('off')
plt.show()

But getting TypeError: expected string or buffer

when I tried the above code with .generate(str(d.most_common(10)))

The wordcloud formed is showing apostrophe(') sign after several words

using Jupyter Notebook | python3 | Ipython


回答1:


First download this file Symbola.ttf in the current folder of the following script.

Architecture file:

file.txt Symbola.ttf my_word_cloud.py

file.txt:

foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz
foo foo foo foo foo foo foo foo foo foo bizz bizz bizz bizz foo foo

my_word_cloud.py:

import io
from collections import Counter
from os import path

import matplotlib.pyplot as plt
from wordcloud import WordCloud

d = path.dirname(__file__)

# It is important to use io.open to correctly load the file as UTF-8
text = io.open(path.join(d, 'file.txt')).read()

words = text.split()
print(Counter(words))

# Generate a word cloud image
# The Symbola font includes most emoji
font_path = path.join(d, 'Symbola.ttf')
word_cloud = WordCloud(font_path=font_path).generate(text)

# Display the generated image:
plt.imshow(word_cloud)
plt.axis("off")
plt.show()

Result:

Counter({'foo': 17, 'bizz': 9, 'buzz': 5})

See a lot of other examples, here I created a simple example for you:

https://github.com/amueller/word_cloud/tree/master/examples




回答2:


most_common(x) is not a method of WordCloud. However, you can pass the parameter

max_words = 

and this should do what you're attempting.



来源:https://stackoverflow.com/questions/44750574/creating-wordcloud-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!