tidytext

Mapping the topic of the review in R

╄→尐↘猪︶ㄣ 提交于 2020-07-18 07:59:24
问题 I have two data sets, Review Data & Topic Data Dput code of my Review Data structure(list(Review = structure(2:1, .Label = c("Canteen Food could be improved", "Sports and physical exercise need to be given importance"), class = "factor")), class = "data.frame", row.names = c(NA, -2L)) Dput code of my Topic Data structure(list(word = structure(2:1, .Label = c("canteen food", "sports and physical"), class = "factor"), Topic = structure(2:1, .Label = c("Canteen", "Sports "), class = "factor")),

replace range of numbers with single numbers in a character string

陌路散爱 提交于 2020-05-29 05:52:55
问题 Is there any way to replace range of numbers wih single numbers in a character string? Number can range from n-n, most probably around 1-15, 4-10 ist also possible. the range could be indicated with a) - a <- "I would like to buy 1-3 cats" or with a word b) for example: to, bis, jusqu'à b <- "I would like to buy 1 jusqu'à 3 cats" The results should look like "I would like to buy 1,2,3 cats" I found this: Replace range of numbers with certain number but could not really use it in R. 回答1:

replace range of numbers with single numbers in a character string

只愿长相守 提交于 2020-05-29 05:52:18
问题 Is there any way to replace range of numbers wih single numbers in a character string? Number can range from n-n, most probably around 1-15, 4-10 ist also possible. the range could be indicated with a) - a <- "I would like to buy 1-3 cats" or with a word b) for example: to, bis, jusqu'à b <- "I would like to buy 1 jusqu'à 3 cats" The results should look like "I would like to buy 1,2,3 cats" I found this: Replace range of numbers with certain number but could not really use it in R. 回答1:

R: Error in UseMethod(“tbl_vars”)

别来无恙 提交于 2020-05-13 20:01:38
问题 So I'm running the code below in R Studio and getting this error: Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "character" I don't know how to fix it cause there is no tbl_vars function! Can someone help? for (i in 1:ceiling(nrow(reviews)/batch)) { row_start <- i*batch-batch+1 row_end <- ifelse(i*batch < nrow(reviews), i*batch, nrow(reviews)) print(paste("Processing row", row_start, "to row", row_end)) reviews[row_start:row_end, ] %>%

Tokenizing issue

十年热恋 提交于 2020-01-25 10:47:05
问题 I am trying to tokenize a sentence as follows. Section <- c("If an infusion reaction occurs, interrupt the infusion.") df <- data.frame(Section) When I tokenize using tidytext and the code below, AA <- df %>% mutate(tokens = str_extract_all(df$Section, "([^\\s]+)"), locations = str_locate_all(df$Section, "([^\\s]+)"), locations = map(locations, as.data.frame)) %>% select(-Section) %>% unnest(tokens, locations) it gives me a result set as below (see image). How do i get the comma and the

tidytext R in spanish - any alternative?

家住魔仙堡 提交于 2020-01-11 09:32:06
问题 I'm doing sentiment analysis from twitter but my tweets are on Spanish so I can't use tidytext to classify the words. Does anyone know if there is a similar package for Spanish? 回答1: There are not a lot of good open source options for sentiment lexicons in non-English languages right now, unfortunately. You can request the NRC lexicon in other languages from the authors; it is translated by Google Translate (which of course adds uncertainty but has shown to be mostly OK overall) and the

tidytext R in spanish - any alternative?

假装没事ソ 提交于 2020-01-11 09:31:49
问题 I'm doing sentiment analysis from twitter but my tweets are on Spanish so I can't use tidytext to classify the words. Does anyone know if there is a similar package for Spanish? 回答1: There are not a lot of good open source options for sentiment lexicons in non-English languages right now, unfortunately. You can request the NRC lexicon in other languages from the authors; it is translated by Google Translate (which of course adds uncertainty but has shown to be mostly OK overall) and the

creating corpus from multiple txt files

你说的曾经没有我的故事 提交于 2020-01-05 08:35:34
问题 I have multiple txt files, I want to have a tidy data. To do that first I create corpus ( I am not sure is it true way to do it). I wrote the following code to have the corpus data. folder<-"C:\\Users\\user\\Desktop\\text analysis\\doc" list.files(path=folder) filelist<- list.files(path=folder, pattern="*.txt") paste(folder, "\\", filelist) filelist<-paste(folder, "\\", filelist, sep="") typeof(filelist) a<- lapply(filelist,FUN=readLines) corpus <- lapply(a ,FUN=paste, collapse=" ") When I

determine the temporality of a sentence with POS tagging

风流意气都作罢 提交于 2019-12-25 00:53:03
问题 I want to find out whether an action has been carried out if will be carried out from a series of sentences. For example: "I will prescribe this medication" versus "I prescribed this medication" or "He had already taken the stuff" versus "he may take the stuff later" I was trying a tidytext approach and decided to simply look for past participle versus future participle verbs. However when I POS tag using the only types of verbs I get are "Verb intransitive" , "Verb (usu participle)" and

Splitting and grouping plain text (grouping text by chapter in dataframe)?

强颜欢笑 提交于 2019-12-24 20:45:28
问题 I have a data frame/tibble where I've imported a file of plain text (txt). The text very consistent and is grouped by chapter. Sometimes the chapter text is only one row, sometimes it's multiple row. Data is in one column like this: # A tibble: 10,708 x 1 x <chr> 1 "Chapter 1 " 2 "Chapter text. " 3 "Chapter 2 " 4 "Chapter text. " 5 "Chapter 3 " 6 "Chapter text. " 7 "Chapter text. " 8 "Chapter 4 " I'm trying to clean the data to have a new column for Chapter and the text from each chapter in