text-mining

Make all words uppercase in Wordcloud in R

老子叫甜甜 提交于 2019-12-23 21:22:11
问题 When creating Wordclouds it is most common to make all the words lowercase. However, I want the wordclouds to display the words uppercase. After forcing the words to be uppercase the wordcloud still display lowercase words. Any ideas why? Reproducable code: library(tm) library(wordcloud) data <- data.frame(text = c("Creativity is the art of being ‘productive’ by using the available resources in a skillful manner. Scientifically speaking, creativity is part of our consciousness and we can be

Is there a more efficient way to append lines from a large file to a numpy array? - MemoryError

天大地大妈咪最大 提交于 2019-12-23 17:45:11
问题 I'm trying to use this lda package to process a term-document matrix csv file with 39568 rows and 27519 columns containing counting/natural numbers only. Problem: I'm getting a MemoryError with my approach to read the file and store it to a numpy array. Goal: Get the numbers from the TDM csv file and convert it to numpy array so I can use the numpy array as input for the lda. with open("Results/TDM - Matrix Only.csv", 'r') as matrix_file: matrix = np.array([[int(value) for value in line.strip

R - slowly working lapply with sort on ordered factor

痴心易碎 提交于 2019-12-23 15:56:48
问题 Based on the question More efficient means of creating a corpus and DTM I've prepared my own method for building a Term Document Matrix from a large corpus which (I hope) do not require Terms x Documents memory. sparseTDM <- function(vc){ id = unlist(lapply(vc, function(x){x$meta$id})) content = unlist(lapply(vc, function(x){x$content})) out = strsplit(content, "\\s", perl = T) names(out) = id lev.terms = sort(unique(unlist(out))) lev.docs = id v1 = lapply( out, function(x, lev) { sort(as

R - slowly working lapply with sort on ordered factor

≡放荡痞女 提交于 2019-12-23 15:54:04
问题 Based on the question More efficient means of creating a corpus and DTM I've prepared my own method for building a Term Document Matrix from a large corpus which (I hope) do not require Terms x Documents memory. sparseTDM <- function(vc){ id = unlist(lapply(vc, function(x){x$meta$id})) content = unlist(lapply(vc, function(x){x$content})) out = strsplit(content, "\\s", perl = T) names(out) = id lev.terms = sort(unique(unlist(out))) lev.docs = id v1 = lapply( out, function(x, lev) { sort(as

Error: could not find function “classify_emotion”

China☆狼群 提交于 2019-12-23 04:27:12
问题 I have been trying to do sentiment analysis for a random file. However the error thrown is : could not find function "classify_emotion" The package "sentiment" wasn't available (for R version 3.1.2). However got the same installed through : install_github('sentiment140', 'okugami79') . The error is still there : could not find function "classify_emotion" The code: library(plyr) library(ggplot2) library(wordcloud) library(RColorBrewer) library(tm) library(SnowballC) library(sentiment) library

How to divide text (string) by a certain character using r

丶灬走出姿态 提交于 2019-12-23 03:56:28
问题 How to classify strings using r My text file is such a structure. >cell_c2< 8/30/2017 This location has been closed for a few months. Recently I passed by and attracted by their street sign Teriyaki Grill Open. I gave a try. The cashier was friendly and recommended me to try their most popular Teriyaki chicken box. It came with mixed vege and steamed rice. They have an open kitchen with SS equipment. I could see the chef make grill after my order was placed. I love the teriyaki chicken with

Print first line of one element of Corpus in R using tm package

我的梦境 提交于 2019-12-23 02:52:44
问题 How do you print a small sample, or first line, of a corpus in R using the tm package? I have a very large corpus ( > 1 GB) and am doing some text cleaning. I would like to test as I apply cleaning procedures. Printing just the first line, or first few lines of a corpus would be ideal. # Load Libraries library(tm) # Read in Corpus corp <- SimpleCorpus( DirSource( "C:/TextDocument")) # Remove puncuation corp <- removePunctuation(corp, preserve_intra_word_contractions = TRUE, preserve_intra

Wordcloud showing colour based on continous metadata in R

折月煮酒 提交于 2019-12-22 21:56:59
问题 I'm creating a wordcloud in which the size of the words is based on frequency, but i want the colour of the words to be mapped to a third variable (stress, which is the amount of stress associated with each word, a numerical or continuous variable). I tried the following, which gave me only two different colours (yellow and purple) while i want something more smooth. I would like some color range like a palette that goes from green to red for example. df = data.frame(word = c("calling",

Wordcloud showing colour based on continous metadata in R

牧云@^-^@ 提交于 2019-12-22 21:56:39
问题 I'm creating a wordcloud in which the size of the words is based on frequency, but i want the colour of the words to be mapped to a third variable (stress, which is the amount of stress associated with each word, a numerical or continuous variable). I tried the following, which gave me only two different colours (yellow and purple) while i want something more smooth. I would like some color range like a palette that goes from green to red for example. df = data.frame(word = c("calling",

Wordcloud showing colour based on continous metadata in R

心已入冬 提交于 2019-12-22 21:56:01
问题 I'm creating a wordcloud in which the size of the words is based on frequency, but i want the colour of the words to be mapped to a third variable (stress, which is the amount of stress associated with each word, a numerical or continuous variable). I tried the following, which gave me only two different colours (yellow and purple) while i want something more smooth. I would like some color range like a palette that goes from green to red for example. df = data.frame(word = c("calling",