tidytext | 易学教程

Mapping the topic of the review in R

阅读更多关于 Mapping the topic of the review in R

问题 I have two data sets, Review Data & Topic Data Dput code of my Review Data structure(list(Review = structure(2:1, .Label = c("Canteen Food could be improved", "Sports and physical exercise need to be given importance"), class = "factor")), class = "data.frame", row.names = c(NA, -2L)) Dput code of my Topic Data structure(list(word = structure(2:1, .Label = c("canteen food", "sports and physical"), class = "factor"), Topic = structure(2:1, .Label = c("Canteen", "Sports "), class = "factor")),

replace range of numbers with single numbers in a character string

阅读更多关于 replace range of numbers with single numbers in a character string

问题 Is there any way to replace range of numbers wih single numbers in a character string? Number can range from n-n, most probably around 1-15, 4-10 ist also possible. the range could be indicated with a) - a <- "I would like to buy 1-3 cats" or with a word b) for example: to, bis, jusqu'à b <- "I would like to buy 1 jusqu'à 3 cats" The results should look like "I would like to buy 1,2,3 cats" I found this: Replace range of numbers with certain number but could not really use it in R. 回答1:

replace range of numbers with single numbers in a character string

阅读更多关于 replace range of numbers with single numbers in a character string

R: Error in UseMethod(“tbl_vars”)

阅读更多关于 R: Error in UseMethod(“tbl_vars”)

问题 So I'm running the code below in R Studio and getting this error: Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "character" I don't know how to fix it cause there is no tbl_vars function! Can someone help? for (i in 1:ceiling(nrow(reviews)/batch)) { row_start <- i*batch-batch+1 row_end <- ifelse(i*batch < nrow(reviews), i*batch, nrow(reviews)) print(paste("Processing row", row_start, "to row", row_end)) reviews[row_start:row_end, ] %>%

Tokenizing issue

阅读更多关于 Tokenizing issue

问题 I am trying to tokenize a sentence as follows. Section <- c("If an infusion reaction occurs, interrupt the infusion.") df <- data.frame(Section) When I tokenize using tidytext and the code below, AA <- df %>% mutate(tokens = str_extract_all(df$Section, "([^\\s]+)"), locations = str_locate_all(df$Section, "([^\\s]+)"), locations = map(locations, as.data.frame)) %>% select(-Section) %>% unnest(tokens, locations) it gives me a result set as below (see image). How do i get the comma and the

tidytext R in spanish - any alternative?

阅读更多关于 tidytext R in spanish - any alternative?

问题 I'm doing sentiment analysis from twitter but my tweets are on Spanish so I can't use tidytext to classify the words. Does anyone know if there is a similar package for Spanish? 回答1: There are not a lot of good open source options for sentiment lexicons in non-English languages right now, unfortunately. You can request the NRC lexicon in other languages from the authors; it is translated by Google Translate (which of course adds uncertainty but has shown to be mostly OK overall) and the

tidytext R in spanish - any alternative?

阅读更多关于 tidytext R in spanish - any alternative?

creating corpus from multiple txt files

阅读更多关于 creating corpus from multiple txt files

问题 I have multiple txt files, I want to have a tidy data. To do that first I create corpus ( I am not sure is it true way to do it). I wrote the following code to have the corpus data. folder<-"C:\\Users\\user\\Desktop\\text analysis\\doc" list.files(path=folder) filelist<- list.files(path=folder, pattern="*.txt") paste(folder, "\\", filelist) filelist<-paste(folder, "\\", filelist, sep="") typeof(filelist) a<- lapply(filelist,FUN=readLines) corpus <- lapply(a ,FUN=paste, collapse=" ") When I

determine the temporality of a sentence with POS tagging

阅读更多关于 determine the temporality of a sentence with POS tagging

问题 I want to find out whether an action has been carried out if will be carried out from a series of sentences. For example: "I will prescribe this medication" versus "I prescribed this medication" or "He had already taken the stuff" versus "he may take the stuff later" I was trying a tidytext approach and decided to simply look for past participle versus future participle verbs. However when I POS tag using the only types of verbs I get are "Verb intransitive" , "Verb (usu participle)" and

Splitting and grouping plain text (grouping text by chapter in dataframe)?

阅读更多关于 Splitting and grouping plain text (grouping text by chapter in dataframe)?

问题 I have a data frame/tibble where I've imported a file of plain text (txt). The text very consistent and is grouped by chapter. Sometimes the chapter text is only one row, sometimes it's multiple row. Data is in one column like this: # A tibble: 10,708 x 1 x <chr> 1 "Chapter 1 " 2 "Chapter text. " 3 "Chapter 2 " 4 "Chapter text. " 5 "Chapter 3 " 6 "Chapter text. " 7 "Chapter text. " 8 "Chapter 4 " I'm trying to clean the data to have a new column for Chapter and the text from each chapter in