as.numeric with comma decimal separators?

前端未结

关注

 7  1665

I have a large vector of strings of the form:

Input = c(\"1,223\", \"12,232\", \"23,0\")

etc. That\'s to say, decimals separated by commas,

相关标签:

7条回答

离开以前

2020-11-27 05:03

Building on @adibender solution:

input = '23,67'
as.numeric(gsub(
                # ONLY for strings containing numerics, comma, numerics
                "^([0-9]+),([0-9]+)$", 
                # Substitute by the first part, dot, second part
                "\\1.\\2", 
                input
                ))

I guess that is a safer match...

0 讨论(0)

感情败类

2020-11-27 05:12
The answer by adibender does not work when there are multiple commas.

In that case the suggestion from use554546 and answer from Deena can be used.
```
Input = c("1,223,765", "122,325,000", "23,054")
as.numeric(gsub("," ,"", Input))
```
ouput:
```
[1] 1223765 122325000 23054
```
The function gsub replaces all occurances. The function sub replaces only the first.
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-11-27 05:20

As stated by , it's way easier to do this while importing a file. Thw recently released reads package has a very useful features, locale, well explained here, that allows the user to import numbers with comma decimal mark using locale = locale(decimal_mark = ",") as argument.

0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2020-11-27 05:22
The readr package has a function to parse numbers from strings. You can set many options via the locale argument.

For comma as decimal separator you can write:
```
readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

忘掉有多难

2020-11-27 05:23

scan(text=Input, dec=",")
## [1]  1.223 12.232 23.000

But it depends on how long your vector is. I used rep(Input, 1e6) to make a long vector and my machine just hangs. 1e4 is fine, though. @adibender's solution is much faster. If we run on 1e4, a lot faster:

Unit: milliseconds
         expr        min         lq     median         uq        max neval
  adibender()   6.777888   6.998243   7.119136   7.198374   8.149826   100
 sebastianc() 504.987879 507.464611 508.757161 510.732661 517.422254   100

0 讨论(0)

滥情空心

2020-11-27 05:24
Also, if you are reading in the raw data, the read.table and all the associated functions have a dec argument. eg:
```
read.table("file.txt", dec=",")
```
When all else fails, gsub and sub are your friends.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页