问题
I am trying to create a wordcloud in python after cleaning text file ,
I got the required results i.e words which are mostly used in the text file but unable to plot.
My code:
import collections
from wordcloud import WordCloud
import matplotlib.pyplot as plt
file = open('example.txt', encoding = 'utf8' )
stopwords = set(line.strip() for line in open('stopwords'))
wordcount = {}
for word in file.read().split():
word = word.lower()
word = word.replace(".","")
word = word.replace(",","")
word = word.replace("\"","")
word = word.replace("“","")
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
d = collections.Counter(wordcount)
for word, count in d.most_common(10):
print(word , ":", count)
#wordcloud = WordCloud().generate(text)
#fig = plt.figure()
#fig.set_figwidth(14)
#fig.set_figheight(18)
#plt.imshow(wordcloud.recolor(color_func=grey_color, random_state=3))
#plt.title(title, color=fontcolor, size=30, y=1.01)
#plt.annotate(footer, xy=(0, -.025), xycoords='axes fraction', fontsize=infosize, color=fontcolor)
#plt.axis('off')
#plt.show()
Edit: Plotted the wordcloud with following code:
wordcloud = WordCloud(background_color='white',
width=1200,
height=1000
).generate((d.most_common(10)))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
But getting TypeError: expected string or buffer
when I tried the above code with .generate(str(d.most_common(10)))
The wordcloud formed is showing apostrophe(') sign after several words
using Jupyter Notebook | python3 | Ipython
回答1:
First download this file Symbola.ttf in the current folder of the following script.
Architecture file:
file.txt Symbola.ttf my_word_cloud.py
file.txt:
foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz
foo foo foo foo foo foo foo foo foo foo bizz bizz bizz bizz foo foo
my_word_cloud.py:
import io
from collections import Counter
from os import path
import matplotlib.pyplot as plt
from wordcloud import WordCloud
d = path.dirname(__file__)
# It is important to use io.open to correctly load the file as UTF-8
text = io.open(path.join(d, 'file.txt')).read()
words = text.split()
print(Counter(words))
# Generate a word cloud image
# The Symbola font includes most emoji
font_path = path.join(d, 'Symbola.ttf')
word_cloud = WordCloud(font_path=font_path).generate(text)
# Display the generated image:
plt.imshow(word_cloud)
plt.axis("off")
plt.show()
Result:
Counter({'foo': 17, 'bizz': 9, 'buzz': 5})
See a lot of other examples, here I created a simple example for you:
https://github.com/amueller/word_cloud/tree/master/examples
回答2:
most_common(x)
is not a method of WordCloud. However, you can pass the parameter
max_words =
and this should do what you're attempting.
来源:https://stackoverflow.com/questions/44750574/creating-wordcloud-using-python