word-cloud

change specific word color in wordcloud

谁都会走 提交于 2019-12-03 05:05:46
I would like to build a word cloud with R (I have done so with the package wordcloud ) and then color specific words a certain color. Currently the behavior of the function is to color words according to frequency (which can be useful) but word size already does this so I'd want to use color for additional meaning. Any idea on how to color specific words in wordcloud? (If there's another wordcloud function in R I'm unaware of I'm more than willing to go that route.) A mock example and my attempt (I tried to treat the color argument in the same manor I would a regular plot from the plot

How do I print lda topic model and the word cloud of each of the topics

荒凉一梦 提交于 2019-12-03 00:33:15
from nltk.tokenize import RegexpTokenizer from stop_words import get_stop_words from gensim import corpora, models import gensim import os from os import path from time import sleep import matplotlib.pyplot as plt import random from wordcloud import WordCloud, STOPWORDS tokenizer = RegexpTokenizer(r'\w+') en_stop = set(get_stop_words('en')) with open(os.path.join('c:\users\kaila\jobdescription.txt')) as f: Reader = f.read() Reader = Reader.replace("will", " ") Reader = Reader.replace("please", " ") texts = unicode(Reader, errors='replace') tdm = [] raw = texts.lower() tokens = tokenizer

Creating wordcloud using python

孤人 提交于 2019-12-02 10:12:51
问题 I am trying to create a wordcloud in python after cleaning text file , I got the required results i.e words which are mostly used in the text file but unable to plot. My code: import collections from wordcloud import WordCloud import matplotlib.pyplot as plt file = open('example.txt', encoding = 'utf8' ) stopwords = set(line.strip() for line in open('stopwords')) wordcount = {} for word in file.read().split(): word = word.lower() word = word.replace(".","") word = word.replace(",","") word =

text mining with tm package in R ,remove words starting from [http] or any other specifc word

微笑、不失礼 提交于 2019-12-02 04:55:14
I am new to R and text mining. I had made a word cloud out of twitter feed related to some term. The problem that I'm facing is that in the wordcloud it shows http:... or htt... How do I deal about this issue I tried using metacharacter * but I still doubt if I'm applying it right tw.text = removeWords(tw.text,c(stopwords("en"),"rt","http\\*")) somebody into text-minning please help me with this. If you are looking to remove URLs from your string, you may use: gsub("(f|ht)tp(s?)://(.*)[.][a-z]+", "", x) Where x would be: x <- c("some text http://idontwantthis.com", "same problem again http:/

Creating wordcloud using python

狂风中的少年 提交于 2019-12-02 03:13:45
I am trying to create a wordcloud in python after cleaning text file , I got the required results i.e words which are mostly used in the text file but unable to plot. My code: import collections from wordcloud import WordCloud import matplotlib.pyplot as plt file = open('example.txt', encoding = 'utf8' ) stopwords = set(line.strip() for line in open('stopwords')) wordcount = {} for word in file.read().split(): word = word.lower() word = word.replace(".","") word = word.replace(",","") word = word.replace("\"","") word = word.replace("“","") if word not in stopwords: if word not in wordcount:

Python: wordcloud, repetitve words

三世轮回 提交于 2019-11-30 13:21:39
问题 In the word cloud I have repetitive words and I do not understand why they are not counted together and are shown then as one word. from wordcloud import WordCloud word_string = 'oh oh oh oh oh oh verse wrote book stand title book would life superman thats make feel count privilege love ideal honored know feel see everyday things things say rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya

How to generate word clouds from LDA models in Python?

北慕城南 提交于 2019-11-30 10:39:16
I am doing some topic modeling on newspaper articles, and have implemented LDA using gensim in Python3. Now I want to create a word cloud for each topic, using the top 20 words for each topic. I know I can print the words, and save the LDA model, but is there any way to just save the top words for each topic which I can further use for generating word clouds? I tried to google it, but could not find anything relevant. Any help is appreciated. Kenneth Orton You can get the topn words from an LDA model using Gensim's built-in method show_topic. lda = models.LdaModel.load('lda.model') for i in

Increase resolution with word-cloud and remove empty border

≯℡__Kan透↙ 提交于 2019-11-29 23:12:32
I am using word cloud with some txt files. How do I change this example if I wanted to 1) increase resolution and 2) remove empty border. #!/usr/bin/env python2 """ Minimal Example =============== Generating a square wordcloud from the US constitution using default arguments. """ from os import path import matplotlib.pyplot as plt from wordcloud import WordCloud d = path.dirname(__file__) # Read the whole text. text = open(path.join(d, 'constitution.txt')).read() wordcloud = WordCloud().generate(text) # Open a plot of the generated image. plt.imshow(wordcloud) plt.axis("off") plt.show() You

How to generate word clouds from LDA models in Python?

我的梦境 提交于 2019-11-29 15:51:56
问题 I am doing some topic modeling on newspaper articles, and have implemented LDA using gensim in Python3. Now I want to create a word cloud for each topic, using the top 20 words for each topic. I know I can print the words, and save the LDA model, but is there any way to just save the top words for each topic which I can further use for generating word clouds? I tried to google it, but could not find anything relevant. Any help is appreciated. 回答1: You can get the topn words from an LDA model

Creating “word” cloud of phrases, not individual words in R

不想你离开。 提交于 2019-11-29 13:33:03
I am trying to make a word cloud from a list of phrases, many of which are repeated, instead of from individual words. My data looks something like this, with one column of my data frame being a list of phrases. df$names <- c("John", "John", "Joseph A", "Mary A", "Mary A", "Paul H C", "Paul H C") I would like to make a word cloud where all of these names are treated as individual phrases whose frequency is displayed, not the words which make them up. The code I have been using looks like: df.corpus <- Corpus(DataframeSource(data.frame(df$names))) df.corpus <- tm_map(client.corpus, function(x)