问题
I'm trying to read into R a csv file that contains information on political contributions. From what I understand, the columns by default are imported as factors, but I need the the amount column ('CTRIB_AMT' in the dataset) to be imported as a numeric column so I can run a variety of functions that wouldn't work for factors. The column is formatted as a currency with a "$" as prefix.
I used a simple read command to import the file initially:
contribs <- read.csv('path/to/file')
And then tried to convert the CTRIB_AMT from currency to numeric:
as.numeric(as.character(sub("$","",contribs$CTRIB_AMT, fixed=TRUE)))
But that didn't work. The functions I'm trying to use for the CTRIB_AMT columns are:
vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)
See related question here.
Any thoughts on how to import file initially so column is numeric or how to convert it after importing?
回答1:
I'm not sure how to read it in directly, but you can modify it once it's in:
> A <- read.csv("~/Desktop/data.csv")
> A
id desc price
1 0 apple $1.00
2 1 banana $2.25
3 2 grapes $1.97
> A$price <- as.numeric(sub("\\$","", A$price))
> A
id desc price
1 0 apple 1.00
2 1 banana 2.25
3 2 grapes 1.97
> str(A)
'data.frame': 3 obs. of 3 variables:
$ id : int 0 1 2
$ desc : Factor w/ 3 levels "apple","banana",..: 1 2 3
$ price: num 1 2.25 1.97
I think it might just have been a missing escape in your sub. $ indicates the end of a line in regular expressions. \$ is a dollar sign. But then you have to escape the escape...
回答2:
Another way could be setting conversion using setAs
.
It was used in two (similar) question:
- Processing negative number in "accounting" formatR
- How to read a csv file where some numbers contain commas?
For your needs:
setClass("Currency")
setAs("character", "Currency",
function(from) as.numeric(sub("$","",from, fixed=TRUE)))
contribs <- read.csv("path/to/file", colClasses=c(CTRIB_AMT="Currency"))
回答3:
Yet another solution for a problem solved long time ago:
convertCurrency <- function(currency) {
currency1 <- sub('$','',as.character(currency),fixed=TRUE)
currency2 <- as.numeric(gsub('\\,','',as.character(currency1)))
currency2
}
contribs$CTRIB_AMT_NUM <- convertCurrency(contribs$CTRIB_AMT)
回答4:
Taking advantage of the powerful parsers the readr
package offers out of the box:
my_parser <- function(col) {
# Try first with parse_number that handles currencies automatically quite well
res <- suppressWarnings(readr::parse_number(col))
if (is.null(attr(res, "problems", exact = TRUE))) {
res
} else {
# If parse_number fails, fall back on parse_guess
readr::parse_guess(col)
# Alternatively, we could simply return col without further parsing attempt
}
}
library(dplyr)
name <- c('john','carl', 'hank')
salary <- c('$23,456.33','$45,677.43','$76,234.88')
emp_data <- data.frame(name,salary)
emp_data %>%
mutate(foo = "USD13.4",
bar = "£37") %>%
mutate_all(my_parser)
# name salary foo bar
# 1 john 23456.33 13.4 37
# 2 carl 45677.43 13.4 37
# 3 hank 76234.88 13.4 37
回答5:
Or use something like as.numeric(substr(as.character(contribs$CTRIB_AMT),2,20))
we know that there certainly won't be more than 20 characters.
Another thing to note is that you can remove the need to convert from a factor alltogether if you set stringsAsFactors=F
in your call to read.csv()
来源:https://stackoverflow.com/questions/7337824/read-csv-file-in-r-with-currency-column-as-numeric