I have a large vector of strings of the form:
Input = c(\"1,223\", \"12,232\", \"23,0\")
etc. That\'s to say, decimals separated by commas,
Building on @adibender solution:
input = '23,67'
as.numeric(gsub(
# ONLY for strings containing numerics, comma, numerics
"^([0-9]+),([0-9]+)$",
# Substitute by the first part, dot, second part
"\\1.\\2",
input
))
I guess that is a safer match...
The answer by adibender does not work when there are multiple commas.
In that case the suggestion from use554546 and answer from Deena can be used.
Input = c("1,223,765", "122,325,000", "23,054")
as.numeric(gsub("," ,"", Input))
ouput:
[1] 1223765 122325000 23054
The function gsub
replaces all occurances. The function sub
replaces only the first.
As stated by , it's way easier to do this while importing a file.
Thw recently released reads package has a very useful features, locale
, well explained here, that allows the user to import numbers with comma decimal mark using locale = locale(decimal_mark = ",")
as argument.
The readr
package has a function to parse numbers from strings. You can set many options via the locale
argument.
For comma as decimal separator you can write:
readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))
scan(text=Input, dec=",")
## [1] 1.223 12.232 23.000
But it depends on how long your vector is. I used rep(Input, 1e6)
to make a long vector and my machine just hangs. 1e4
is fine, though. @adibender's solution is much faster. If we run on 1e4, a lot faster:
Unit: milliseconds
expr min lq median uq max neval
adibender() 6.777888 6.998243 7.119136 7.198374 8.149826 100
sebastianc() 504.987879 507.464611 508.757161 510.732661 517.422254 100
Also, if you are reading in the raw data, the read.table
and all the associated functions have a dec
argument. eg:
read.table("file.txt", dec=",")
When all else fails, gsub
and sub
are your friends.