I\'m trying to write a function in R that drops columns from a data frame and returns the new data with a name specified as an argument of the function:
drop <
One reason to need this is when working a great deal with the RStudio console to perform lots of text mining. For example, if you have a large corpus and you want to break it up into sub-corpi based on themes, performing the processing as a function and returning a cleaned corpus can be much faster. An example is below:
processText <- function(inputText, corpName){
outputName <- Corpus(VectorSource(inputText))
outputName <- tm_map(outputName,PlainTextDocument)
outputName <- tm_map(outputName, removeWords, stopwords("english"))
outputName <- tm_map(outputName, removePunctuation)
outputName <- tm_map(outputName, removeNumbers)
outputName <- tm_map(outputName, stripWhitespace)
assign(corpName, outputName, envir = .GlobalEnv)
return(corpName)
}
In the case above, I enter the column from the data frame as the inputText
and the desired output corpus as corpName
. This allows the simple task of the following to process a bunch of text data:
processText(retail$Essay,"retailCorp")
Then the new corpus "retailCorp" shows up in the global environment for further work such as plotting word clouds, etc. Also, I can send lists through the function and get lots of corpi back.