as.numeric with comma decimal separators?

前端 未结 7 1650
野性不改
野性不改 2020-11-27 04:32

I have a large vector of strings of the form:

Input = c(\"1,223\", \"12,232\", \"23,0\")

etc. That\'s to say, decimals separated by commas,

相关标签:
7条回答
  • 2020-11-27 05:03

    Building on @adibender solution:

    input = '23,67'
    as.numeric(gsub(
                    # ONLY for strings containing numerics, comma, numerics
                    "^([0-9]+),([0-9]+)$", 
                    # Substitute by the first part, dot, second part
                    "\\1.\\2", 
                    input
                    ))
    

    I guess that is a safer match...

    0 讨论(0)
  • 2020-11-27 05:12

    The answer by adibender does not work when there are multiple commas.

    In that case the suggestion from use554546 and answer from Deena can be used.

    Input = c("1,223,765", "122,325,000", "23,054")
    as.numeric(gsub("," ,"", Input))
    

    ouput:

    [1] 1223765 122325000 23054
    

    The function gsub replaces all occurances. The function sub replaces only the first.

    0 讨论(0)
  • 2020-11-27 05:20

    As stated by , it's way easier to do this while importing a file. Thw recently released reads package has a very useful features, locale, well explained here, that allows the user to import numbers with comma decimal mark using locale = locale(decimal_mark = ",") as argument.

    0 讨论(0)
  • 2020-11-27 05:22

    The readr package has a function to parse numbers from strings. You can set many options via the locale argument.

    For comma as decimal separator you can write:

    readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))
    
    0 讨论(0)
  • 2020-11-27 05:23
    scan(text=Input, dec=",")
    ## [1]  1.223 12.232 23.000
    

    But it depends on how long your vector is. I used rep(Input, 1e6) to make a long vector and my machine just hangs. 1e4 is fine, though. @adibender's solution is much faster. If we run on 1e4, a lot faster:

    Unit: milliseconds
             expr        min         lq     median         uq        max neval
      adibender()   6.777888   6.998243   7.119136   7.198374   8.149826   100
     sebastianc() 504.987879 507.464611 508.757161 510.732661 517.422254   100
    
    0 讨论(0)
  • 2020-11-27 05:24

    Also, if you are reading in the raw data, the read.table and all the associated functions have a dec argument. eg:

    read.table("file.txt", dec=",")
    

    When all else fails, gsub and sub are your friends.

    0 讨论(0)
提交回复
热议问题