convert corpus into data.frame in R

女生的网名这么多〃 提交于 2019-12-09 13:11:38

问题


I'm using the tm package to apply stemming, and I need to convert the resulting data into a data frame. A solution for this can be found here R tm package vcorpus: Error in converting corpus to data frame, but in my case I have the content of the corpus as:

[[2195]]
i was very impress

instead of

[[2195]]
"i was very impress"

and because of this, if I apply

data.frame(text=unlist(sapply(mycorpus, `[`, "content")), stringsAsFactors=FALSE)

the result will be

<NA>.

Any help is much appreciated!

Code below as an example:

sentence <- c("a small thread was loose on the sandals, otherwise it looked good")
mycorpus <- Corpus(VectorSource(sentence))
mycorpus <- tm_map(mycorpus, stemDocument, language = "english")

inspect(mycorpus)

[[1]]
a small thread was loo on the sandals, otherwi it look good

data.frame(text=unlist(sapply(mycorpus, `[`, "content")), stringsAsFactors=FALSE)

 text
1 <NA>

回答1:


By applying

gsub("http\\w+", "", mycorpus)

the output has class = character, so it works in my case.




回答2:


I'm unable to reproduce the problem using tm_0.6 in R 3.1.0 on a Mac:

> data.frame(text=unlist(sapply(mycorpus, `[`, "content")), stringsAsFactors=FALSE)
                                                                 text
content a small thread was loos on the sandals, otherwis it look good

If I had gotten those undesired results I would have immediately tried:

 data.frame(text=unlist(sapply(mycorpus, `[[`, "content")), stringsAsFactors=FALSE)

... reasoning that since 'constent' is a list-element name that [['content']] should have been able to so the serial extraction. It also looked to me that the unlist might not be needed with that approach:

> data.frame(text=sapply(mycorpus, `[[`, "content"), stringsAsFactors=FALSE)
                                                           text
1 a small thread was loos on the sandals, otherwis it look good


来源:https://stackoverflow.com/questions/25490088/convert-corpus-into-data-frame-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!