trimws bug? leading whitespace not removed

后端 未结 2 528
清酒与你
清酒与你 2020-12-07 01:49

Edit: Thanks to R Yoda, I was finally able to create a reproducible example to the issue I am facing:

x = rawToCha         


        
相关标签:
2条回答
  • 2020-12-07 02:28

    0xa0 is encoding another type of space (the non-breaking space) in R, while 0x20 is the white space.
    trimws searches for white spaces or tabs or linebreaks or carriage returns (represented by [ \t\r\n]+) but not for non-breaking spaces, hence it does not work.
    You can use sub (to suppress either leading or trailing spaces) or gsub (to suppress both trailing and leading spaces) to remove any kind of trailing or leading space(s) (including the one represented by 0xa0):

    sub("^\\s+", "", x)
    [1] "11.132592"
    

    And for removing leading and trailing spaces:

    gsub("(^\\s+)|(\\s+$)", "", x)
    
    0 讨论(0)
  • 2020-12-07 02:32

    A possible solution is replace the wrongly encoded spaces with the right ones:

    trimws(rawToChar(replace(x1, x1 == as.raw(0xa0), as.raw(0x20))))
    

    which gives:

    [1] "11.132592"
    

    For conversion to numeric, just wrap above code in as.numeric.


    Used data:

    x1 <- as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32))
    
    0 讨论(0)
提交回复
热议问题