R-Project no applicable method for 'meta' applied to an object of class “character”

可紊 提交于 2019-12-27 11:47:01

问题


I am trying to run this code (Ubuntu 12.04, R 3.1.1)

# Load requisite packages
library(tm)
library(ggplot2)
library(lsa)

# Place Enron email snippets into a single vector.
text <- c(
  "To Mr. Ken Lay, I’m writing to urge you to donate the millions of dollars you made from selling Enron stock before the company declared bankruptcy.",
  "while you netted well over a $100 million, many of Enron's employees were financially devastated when the company declared bankruptcy and their retirement plans were wiped out",
  "you sold $101 million worth of Enron stock while aggressively urging the company’s employees to keep buying it",
  "This is a reminder of Enron’s Email retention policy. The Email retention policy provides as follows . . .",
  "Furthermore, it is against policy to store Email outside of your Outlook Mailbox and/or your Public Folders. Please do not copy Email onto floppy disks, zip disks, CDs or the network.",
  "Based on our receipt of various subpoenas, we will be preserving your past and future email. Please be prudent in the circulation of email relating to your work and activities.",
  "We have recognized over $550 million of fair value gains on stocks via our swaps with Raptor.",
  "The Raptor accounting treatment looks questionable. a. Enron booked a $500 million gain from equity derivatives from a related party.",
  "In the third quarter we have a $250 million problem with Raptor 3 if we don’t “enhance” the capital structure of Raptor 3 to commit more ENE shares.")
view <- factor(rep(c("view 1", "view 2", "view 3"), each = 3))
df <- data.frame(text, view, stringsAsFactors = FALSE)

# Prepare mini-Enron corpus
corpus <- Corpus(VectorSource(df$text))
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english")))
corpus <- tm_map(corpus, stemDocument, language = "english")
corpus # check corpus

# Mini-Enron corpus with 9 text documents

# Compute a term-document matrix that contains occurrance of terms in each email
# Compute distance between pairs of documents and scale the multidimentional semantic space (MDS) onto two dimensions
td.mat <- as.matrix(TermDocumentMatrix(corpus))
dist.mat <- dist(t(as.matrix(td.mat)))
dist.mat  # check distance matrix

# Compute distance between pairs of documents and scale the multidimentional semantic space onto two dimensions
fit <- cmdscale(dist.mat, eig = TRUE, k = 2)
points <- data.frame(x = fit$points[, 1], y = fit$points[, 2])
ggplot(points, aes(x = x, y = y)) + geom_point(data = points, aes(x = x, y = y, color = df$view)) + geom_text(data = points, aes(x = x, y = y - 0.2, label = row.names(df)))

However, when I run it I get this error (in the td.mat <- as.matrix(TermDocumentMatrix(corpus)) line):

Error in UseMethod("meta", x) : 
  no applicable method for 'meta' applied to an object of class "character"
In addition: Warning message:
In mclapply(unname(content(x)), termFreq, control) :
  all scheduled cores encountered errors in user code

I am not sure what to look at - all modules loaded.


回答1:


The latest version of tm (0.60) made it so you can't use functions with tm_map that operate on simple character values any more. So the problem is your tolower step since that isn't a "canonical" transformation (See getTransformations()). Just replace it with

corpus <- tm_map(corpus, content_transformer(tolower))

The content_transformer function wrapper will convert everything to the correct data type within the corpus. You can use content_transformer with any function that is intended to manipulate character vectors so that it will work in a tm_map pipeline.




回答2:


This is a little old, but just for purposes of later google searches: there's an alternative solution. After corpus <- tm_map(corpus, tolower) you can use corpus <- tm_map(corpus, PlainTextDocument) which beats it right back into the correct data type.




回答3:


I had the same issue, and finally came to a solution:

It seems that the meta information within the corpus object gets corrupted after applying transformations on it.

What I did is just creating again the corpus at the very end of the process, after it was completely ready. Having to overcome other issues, I wrote also a loop in order to copy the text back to my dataframe:

a<- list()
for (i in seq_along(corpus)) {
    a[i] <- gettext(corpus[[i]][[1]]) #Do not use $content here!
}

df$text <- unlist(a) 
corpus <- Corpus(VectorSource(df$text)) #This action restores the corpus.



回答4:


The order of operations on text matters. You should remove stop words before removing punctuation.

I use the following to prepare text. My text is contained in cleanData$LikeMost.

Sometimes, depending on the source, you need the following first:

textData$LikeMost <- iconv(textData$LikeMost, to = "utf-8")

Some stop words are important, so you can create a revised set.

#create revised stopwords list
newWords <- stopwords("english")
keep <- c("no", "more", "not", "can't", "cannot", "isn't", "aren't", "wasn't",
          "weren't", "hasn't", "haven't", "hadn't", "doesn't", "don't", "didn't", "won't")


newWords <- newWords [! newWords %in% keep]

Then, you can run your tm functions:

like <- Corpus(VectorSource(cleanData$LikeMost))
like <- tm_map(like,PlainTextDocument)
like <- tm_map(like, removeWords, newWords)
like <- tm_map(like, removePunctuation)
like <- tm_map(like, removeNumbers)
like <- tm_map(like, stripWhitespace)


来源:https://stackoverflow.com/questions/24771165/r-project-no-applicable-method-for-meta-applied-to-an-object-of-class-charact

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!