I'm trying to read into R a csv file that contains information on political contributions. From what I understand, the columns by default are imported as factors, but I need the the amount column ('CTRIB_AMT' in the dataset) to be imported as a numeric column so I can run a variety of functions that wouldn't work for factors. The column is formatted as a currency with a "$" as prefix.
I used a simple read command to import the file initially:
contribs <- read.csv('path/to/file')
And then tried to convert the CTRIB_AMT from currency to numeric:
as.numeric(as.character(sub("$","",contribs$CTRIB_AMT, fixed=TRUE)))
But that didn't work. The functions I'm trying to use for the CTRIB_AMT columns are:
vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)
See related question here.
Any thoughts on how to import file initially so column is numeric or how to convert it after importing?
I'm not sure how to read it in directly, but you can modify it once it's in:
> A <- read.csv("~/Desktop/data.csv")
> A
id desc price
1 0 apple $1.00
2 1 banana $2.25
3 2 grapes $1.97
> A$price <- as.numeric(sub("\\$","", A$price))
> A
id desc price
1 0 apple 1.00
2 1 banana 2.25
3 2 grapes 1.97
> str(A)
'data.frame': 3 obs. of 3 variables:
$ id : int 0 1 2
$ desc : Factor w/ 3 levels "apple","banana",..: 1 2 3
$ price: num 1 2.25 1.97
I think it might just have been a missing escape in your sub. $ indicates the end of a line in regular expressions. \$ is a dollar sign. But then you have to escape the escape...
Another way could be setting conversion using setAs
.
It was used in two (similar) question:
- Processing negative number in "accounting" formatR
- How to read a csv file where some numbers contain commas?
For your needs:
setClass("Currency")
setAs("character", "Currency",
function(from) as.numeric(sub("$","",from, fixed=TRUE)))
contribs <- read.csv("path/to/file", colClasses=c(CTRIB_AMT="Currency"))
Yet another solution for a problem solved long time ago:
convertCurrency <- function(currency) {
currency1 <- sub('$','',as.character(currency),fixed=TRUE)
currency2 <- as.numeric(gsub('\\,','',as.character(currency1)))
currency2
}
contribs$CTRIB_AMT_NUM <- convertCurrency(contribs$CTRIB_AMT)
Or use something like as.numeric(substr(as.character(contribs$CTRIB_AMT),2,20))
we know that there certainly won't be more than 20 characters.
Another thing to note is that you can remove the need to convert from a factor alltogether if you set stringsAsFactors=F
in your call to read.csv()
Taking advantage of the powerful parsers the readr
package offers out of the box:
my_parser <- function(col) {
# Try first with parse_number that handles currencies automatically quite well
res <- suppressWarnings(readr::parse_number(col))
if (is.null(attr(res, "problems", exact = TRUE))) {
res
} else {
# If parse_number fails, fall back on parse_guess
readr::parse_guess(col)
# Alternatively, we could simply return col without further parsing attempt
}
}
library(dplyr)
name <- c('john','carl', 'hank')
salary <- c('$23,456.33','$45,677.43','$76,234.88')
emp_data <- data.frame(name,salary)
emp_data %>%
mutate(foo = "USD13.4",
bar = "£37") %>%
mutate_all(my_parser)
# name salary foo bar
# 1 john 23456.33 13.4 37
# 2 carl 45677.43 13.4 37
# 3 hank 76234.88 13.4 37
来源:https://stackoverflow.com/questions/7337824/read-csv-file-in-r-with-currency-column-as-numeric