问题
I am a new R user. Will really appreciate if you can help me with solving the tokenization problem:
My task in brief: I am trying to import a text file in into R. One of the text columns is Headline. The dataset is basically a collection of news articles related to a disease.
Issue: I have tried many times to tokenize it using the unnest_tokens function.
It is showing me the following error messages:
Error in UseMethod("unnest_tokens_") : no applicable method for 'unnest_tokens_' applied to an object of class "character"
Error in unnest_tokens(word, Headline) : object 'word' not found
library(dplyr)
library(tidytext)
DengueNews %>%
unnest_tokens(word, Headline)
Note: Link of the dataset:https://drive.google.com/file/d/18VWg-2sO11GpwxMGF1UbziodoWK9B9Ru/view?usp=sharing I am following the instructions from https://www.tidytextmining.com/tidytext.html
回答1:
It is not clear how the data was read. As mentioned in the comments, if the data column 'Headline' is character
class, it should work. Here, we use read_excl
from readxl
package to read the dataset. By default, columns that are character
will be returned with character
class attribute.
library(readxl)
library(tidytext)
DengueNews <- read_excel("DengueNews.xlsx")
class(DengueNew$Headline)
#[1] "character"
DengueNews %>%
unnest_tokens(word, Headline)
# A tibble: 217 x 4
Serial Date Newscontent word
<dbl> <chr> <chr> <chr>
1 216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… dghs
2 216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… 491
3 216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… more
4 216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… hospitali…
5 216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… for
6 216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… dengue
7 216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… in
8 216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… 24hrs
9 215 43725 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA fifth-grader schoolgirl has died of dengue fever at Dhaka Medical College a… 1
10 215 43725 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA fifth-grader schoolgirl has died of dengue fever at Dhaka Medical College a… more
# … with 207 more rows
If we change the column class to another class
factor
, it would fail
library(dplyr)
DengueNews %>%
mutate(Headline = factor(Headline)) %>%
unnest_tokens(word, Healine)
来源:https://stackoverflow.com/questions/58051557/how-can-i-tokenize-a-text-column-in-r-unnest-function-not-working