问题
In the following code, any characters matching "/|@| \|") will be changed to a space.
> library(tm)
> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
> docs <- tm_map(docs, toSpace, "/|@| \\|")
What code would transform all non-letters to a space? (What goes where the xxxxx's are below.)
It is very difficult to put all non-letters in a string... (Very long list, some non-printable, plus the escaping characters things.) So I'm doing the opposite of the above.
> toSpace_2 <- content_transformer(function xxxxxxxxxxxxxxxxxxxxxxx))
> docs <- tm_map(docs, toSpace_2, "a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z")
|This needs to be done by a content_transformer() function to maintain the integrity of docs.
Thanks
回答1:
why don't you use the pattern [^a-zA-Z]
, this should match all non letters.
来源:https://stackoverflow.com/questions/29833571/r-need-content-transformer-called-by-tm-map-to-change-non-letters-to-space