Is there a clean/automatic way to convert CSV values formatted with as percents (with trailing %
symbol) in R?
Here is some example dat
There is no "percentage" type in R. So you need to do some post-processing:
DF <- read.table(text="actual,simulated,percent error
2.1496,8.6066,-300%
0.9170,8.0266,-775%
7.9406,0.2152,97%
4.9637,3.5237,29%", sep=",", header=TRUE)
DF[,3] <- as.numeric(gsub("%", "",DF[,3]))/100
# actual simulated percent.error
#1 2.1496 8.6066 -3.00
#2 0.9170 8.0266 -7.75
#3 7.9406 0.2152 0.97
#4 4.9637 3.5237 0.29
Tidyverse has multiple ways of solving such issues. You can use the parse_number() specification which will strip a number off any symbols, text etc.:
sample_data = "actual,simulated,percent error\n 2.1496,8.6066,-300%\n 0.9170,8.0266,-775%\n7.9406,0.2152,97%\n4.9637,3.5237,29%"
DF <- read_csv(sample_data,col_types = cols(`percent error`= col_number()))
# A tibble: 4 x 3
# actual simulated `percent error`
# <chr> <dbl> <dbl>
# 1 2.1496 8.61 -300
# 2 + 0.9170 8.03 -775
# 3 + 7.9406 0.215 97.0
# 4 + 4.9637 3.52 29.0
With data.table
you can achieve it as
a <- fread("file.csv")[,`percent error` := as.numeric(sub('%', '', `percent error`))/100]
This is the same as Roland's solution except using the stringr
package. When working with strings I'd recommend it though as the interface is more intuitive.
library(stringr)
d <- str_replace(junk$percent.error, pattern="%", "")
junk$percent.error <- as.numeric(d)/100