r : Need content_transformer() called by tm_map() to change non-letters to spaces

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-13 10:20:23

问题


In the following code, any characters matching "/|@| \|") will be changed to a space.

> library(tm)
> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
> docs <- tm_map(docs, toSpace, "/|@| \\|")

What code would transform all non-letters to a space? (What goes where the xxxxx's are below.)

It is very difficult to put all non-letters in a string... (Very long list, some non-printable, plus the escaping characters things.) So I'm doing the opposite of the above.

> toSpace_2 <- content_transformer(function xxxxxxxxxxxxxxxxxxxxxxx))
> docs <- tm_map(docs, toSpace_2, "a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z")

|This needs to be done by a content_transformer() function to maintain the integrity of docs.

Thanks


回答1:


why don't you use the pattern [^a-zA-Z], this should match all non letters.



来源:https://stackoverflow.com/questions/29833571/r-need-content-transformer-called-by-tm-map-to-change-non-letters-to-space

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!