How to remove non UTF-8 characters from text

前端 未结 1 1198
逝去的感伤
逝去的感伤 2021-01-28 10:53

I need help removing non UTF-8 character from my word cloud. So far this is my code. I\'ve tried gsub and removeWords and they are still there in my word cloud and I do not know

1条回答
  •  别那么骄傲
    2021-01-28 11:37

    The signature of gsub is:

    gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

    Not sure what you wanted to do with

    gsub("’","‘","",txt)

    but that line is probably not doing what you want it to do...

    See here for a previous SO question on gsub and non-ascii symbols.

    Edit:

    Suggested solution using iconv:

    Removing all non-ASCII characters:

    txt <- "’xxx‘"
    
    iconv(txt, "latin1", "ASCII", sub="")
    

    Returns:

    [1] "xxx"    
    

    0 讨论(0)
提交回复
热议问题