qdap

Matching a list of phrases to a corpus of documents and returning phrase frequency

我与影子孤独终老i 提交于 2020-02-27 12:04:24
问题 I have a list of phrases and a corpus of documents.There are 100k+ phrases and 60k+ documents in the corpus. The phrases are might/might not present in the corpus. I'm looking forward to find the term frequency of each phrase present in the corpus. An example dataset: Phrases <- c("just starting", "several kilometers", "brief stroll", "gradually boost", "5 miles", "dark night", "cold morning") Doc1 <- "If you're just starting with workout, begin slow." Doc2 <- "Don't jump in brain initial and

Using mgsub function with word boundaries for replacement values

吃可爱长大的小学妹 提交于 2020-01-04 06:09:28
问题 I am trying to replace substrings of string elements within a vector with blank spaces. Below are the vectors we are considering: test <- c("PALMA DE MALLORCA", "THE RICH AND THE POOR", "A CAMEL IN THE DESERT", "SANTANDER SL", "LA") lista <- c("EL", "LA", "ES", "DE", "Y", "DEL", "LOS", "S.L.", "S.A.", "S.C.", "LAS", "DEL", "THE", "OF", "AND", "BY", "S", "L", "A", "C", "SA", "SC", "SL") Then if we apply the mgsub function as it is, we get the following output: library(qdap) mgsub(lista, "",

Using mgsub function with word boundaries for replacement values

送分小仙女□ 提交于 2020-01-04 06:09:16
问题 I am trying to replace substrings of string elements within a vector with blank spaces. Below are the vectors we are considering: test <- c("PALMA DE MALLORCA", "THE RICH AND THE POOR", "A CAMEL IN THE DESERT", "SANTANDER SL", "LA") lista <- c("EL", "LA", "ES", "DE", "Y", "DEL", "LOS", "S.L.", "S.A.", "S.C.", "LAS", "DEL", "THE", "OF", "AND", "BY", "S", "L", "A", "C", "SA", "SC", "SL") Then if we apply the mgsub function as it is, we get the following output: library(qdap) mgsub(lista, "",

replace string in R giving a vector of patterns and vector of replacements

ε祈祈猫儿з 提交于 2019-12-24 06:14:05
问题 Given a string with different placeholders I want to replace, does R have a function that replace all of them given a vector of patterns and a vector of replacements? I have managed to accomplish that with a list and a loop > library(stringr) > tt_ori <- 'I have [%VAR1%] and [%VAR2%]' > tt_out <- tt_ori > ttlist <- list('\\[%VAR1%\\]'="val-1", '\\[%VAR2%\\]'="val-2") > ttlist $`\\[%VAR1%\\]` [1] "val-1" $`\\[%VAR2%\\]` [1] "val-2" > for(var in names(ttlist)) { + print(paste0(var," -> ",ttlist

replace string in R giving a vector of patterns and vector of replacements

前提是你 提交于 2019-12-24 06:14:04
问题 Given a string with different placeholders I want to replace, does R have a function that replace all of them given a vector of patterns and a vector of replacements? I have managed to accomplish that with a list and a loop > library(stringr) > tt_ori <- 'I have [%VAR1%] and [%VAR2%]' > tt_out <- tt_ori > ttlist <- list('\\[%VAR1%\\]'="val-1", '\\[%VAR2%\\]'="val-2") > ttlist $`\\[%VAR1%\\]` [1] "val-1" $`\\[%VAR2%\\]` [1] "val-2" > for(var in names(ttlist)) { + print(paste0(var," -> ",ttlist

Count number of times a word-wildcard appears in text (in R)

邮差的信 提交于 2019-12-11 11:55:07
问题 I have a vector of either regular words ("activated") or wildcard words ("activat*"). I want to: 1) Count the number of times each word appears in a given text (i.e., if "activated" appears in text, "activated" frequency would be 1). 2) Count the number of times each word wildcard appears in a text (i.e., if "activated" and "activation" appear in text, "activat*" frequency would be 2). I'm able to achieve (1), but not (2). Can anyone please help? thanks. library(tm) library(qdap) text <-

Replace the string value with value in the find list in R

我是研究僧i 提交于 2019-12-11 07:27:30
问题 I have a dataset that has a column like string<-c('lib1_Rstudio_case1','lib2_Rstudio_case1and2','lib5_python_notthe correct_language','lib3_Jupyter_really_good','lib1_spyder_nice','lib1_R_the_core') replacement<-c('Rstudio','Jupyter','spyder','R') I want to replace the string value id they match the value in replacement. I am using the following code right now gsub(paste(replacement, collapse = "|"), replacement = replacement, x = string) This in another piece of code which i am using to find

Estimating document polarity using R's qdap package without sentSplit

不羁的心 提交于 2019-12-07 11:27:03
问题 I'd like to apply qdap 's polarity function to a vector of documents, each of which could contain multiple sentences, and obtain the corresponding polarity for each document. For example: library(qdap) polarity(DATA$state)$all$polarity # Results: [1] -0.8165 -0.4082 0.0000 -0.8944 0.0000 0.0000 0.0000 -0.5774 0.0000 [10] 0.4082 0.0000 Warning message: In polarity(DATA$state) : Some rows contain double punctuation. Suggested use of `sentSplit` function. This warning can't be ignored, as it

Estimating document polarity using R's qdap package without sentSplit

社会主义新天地 提交于 2019-12-05 12:33:40
I'd like to apply qdap 's polarity function to a vector of documents, each of which could contain multiple sentences, and obtain the corresponding polarity for each document. For example: library(qdap) polarity(DATA$state)$all$polarity # Results: [1] -0.8165 -0.4082 0.0000 -0.8944 0.0000 0.0000 0.0000 -0.5774 0.0000 [10] 0.4082 0.0000 Warning message: In polarity(DATA$state) : Some rows contain double punctuation. Suggested use of `sentSplit` function. This warning can't be ignored, as it seems to add the polarity scores of each sentence in the document. This can result in document-level

R qdap::mgsub, how to pass a pattern with a regular expression?

孤街醉人 提交于 2019-12-03 18:16:31
问题 In a previous question (replace string in R giving a vector of patterns and vector of replacements) y found that mgsub does have as pattern a string that does not need to br escape. That is good when you want to replace text like '[%.+%]' as a literal string, but then is a bad thing if you need to pass a real regular expression like: library('stringr') library('qdap') tt_ori <- 'I have VAR1 and VAR2' ttl <- list(ttregex='VAR([12])', val="val-\\1") ttl # OK stringr::str_replace_all( tt_ori,