I\'m trying to use the tm package in R to perform some text analysis. I tied the following:
require(tm)
dataSet <- Corpus(DirSource(\'tmp/\'))
dataSet <
The official FAQ seems to be not working in my situation:
tm_map(yourCorpus, function(x) iconv(x, to='UTF-8-MAC', sub='byte'))
Finally I made it using the for & Encoding function:
for (i in 1:length(dataSet))
{
Encoding(corpus[[i]])="UTF-8"
}
corpus <- tm_map(dataSet, tolower)
If it's alright to ignore invalid inputs, you could use R's error handling. e.g:
dataSet <- Corpus(DirSource('tmp/'))
dataSet <- tm_map(dataSet, function(data) {
#ERROR HANDLING
possibleError <- tryCatch(
tolower(data),
error=function(e) e
)
# if(!inherits(possibleError, "error")){
# REAL WORK. Could do more work on your data here,
# because you know the input is valid.
# useful(data); fun(data); good(data);
# }
})
There is an additional example here: http://gastonsanchez.wordpress.com/2012/05/29/catching-errors-when-using-tolower/