How to remove unicode from string?

前端 未结 4 1021
囚心锁ツ
囚心锁ツ 2020-11-27 23:27

I have a string like:

q <-\"  1000-66329\"

I want to remove and get only 1000 66329

相关标签:
4条回答
  • 2020-11-27 23:57

    If always is the first character, you can try:

    substring("\U00A6 1000-66B29", 2)
    

    if R prints the string as <U+00A6> 1000-66329 instead of ¦ 1000-66B29 then <U+00A6> is interpreted as the string "<U+00A6>" instead of the unicode character. Then you can do:

    substring("<U+00A6>  1000-66329",9)
    

    Both ways the result is:

    [1] "  1000-66329"
    
    0 讨论(0)
  • 2020-11-28 00:14

    Instead of removing you should convert it to the appropriate format ... You have to set your local to UTF-8 like so:

    Sys.setlocale("LC_CTYPE", "en_US.UTF-8")
    

    Maybe you will see the following message:

    Warning message:
    In Sys.setlocale("LC_CTYPE", "en_US.UTF-8") :
      OS reports request to set locale to "en_US.UTF-8" cannot be honored
    

    In this case you should use stringi::stri_trans_general(x, "zh")

    Here "zh" means "chinese". You should know which language you have to convert to. That's it

    0 讨论(0)
  • 2020-11-28 00:16

    I just want to remove unicode <U+00A6> which is at the beginning of string.

    Then you do not need a gsub, you can use a sub with "^\\s*<U\\+\\w+>\\s*" pattern:

    q <-"<U+00A6>  1000-66329"
    sub("^\\s*<U\\+\\w+>\\s*", "", q)
    

    Pattern details:

    • ^ - start of string
    • \\s* - zero or more whitespaces
    • <U\\+ - a literal char sequence <U+
    • \\w+ - 1 or more letters, digits or underscores
    • > - a literal >
    • \\s* - zero or more whitespaces.

    If you also need to replace the - with a space, add |- alternative and use gsub (since now we expect several replacements and the replacement must be a space - same is in akrun's answer):

    trimws(gsub("^\\s*<U\\+\\w+>|-", " ", q))
    

    See the R online demo

    0 讨论(0)
  • 2020-11-28 00:16

    We can also do

    trimws(gsub("\\S+\\s+|-", " ", q))
    #[1] "1000 66329"
    
    0 讨论(0)
提交回复
热议问题