问题
I have a Date
, and am interested in representing it as an integer of yyyymm
form. Currently, I do:
get_year_month <- function(d) { return(as.integer(format(d, "%Y%m")))}
mydate = seq.Date(from=as.Date("2012-01-01"), to=as.Date("5012-01-01"), by=1)
system.time(ym <- get_year_month(mydate))
# user system elapsed
# 5.972 0.974 6.951
This is very slow for large datasets. Is there a faster way? Please provide timings for your answers so they can be easily compared. Use the above example.
回答1:
Using functions from the lubridate
package can be almost twice as fast as your function :
mydate = as.Date(rep("2012-01-01",1000))
library(lubridate)
library(microbenchmark)
microbenchmark(get_year_month(mydate),
year(mydate)*100+month(mydate))
gives :
R> Unit: milliseconds
expr min lq median uq
get_year_month(mydate) 2.150296 2.188370 2.218176 2.285973
year(mydate) * 100 + month(mydate) 1.220016 1.228129 1.239704 1.284568
回答2:
It would be best to keep your Dates in POSIXlt
format if you want to manipulate them like that:
> system.time(ym <- get_year_month(mydate))
user system elapsed
4.039 0.025 4.079
> system.time(mydatep <- as.POSIXlt(mydate))
user system elapsed
3.576 0.016 3.603
> system.time(ym <- (1900 + mydatep$year)*100 + (mydatep$mon + 1))
user system elapsed
0.010 0.005 0.015
It's still a little faster, and you get subsequent similar operations for free, in terms of time.
回答3:
You can try using yearmon
class from zoo
package. In general if you are doing timeseries manipulation and analysis, I would suggest using xts
or atleast zoo
class. xts
has lot of functionality for analysis of very huge timeseries data.
Here is quick benchmark against other suggested solutions.
get_year_month <- function(d) {
return(as.integer(format(d, "%Y%m")))
}
mydate = as.Date(rep("2012-01-01", 1e+06))
microbenchmark(get_year_month(mydate), year(mydate) * 100 + month(mydate), as.yearmon(mydate, format = "%Y-%m-%d"), times = 1)
## Unit: milliseconds
## expr min lq median uq max neval
## get_year_month(mydate) 1049.8813 1049.8813 1049.8813 1049.8813 1049.8813 1
## year(mydate) * 100 + month(mydate) 434.1765 434.1765 434.1765 434.1765 434.1765 1
## as.yearmon(mydate, format = "%Y-%m-%d") 249.6704 249.6704 249.6704 249.6704 249.6704 1
回答4:
There may not be a faster way for a single item. However you can make a version of the function that operates on collections run much faster than linearly by using builtin replicate e.g.
function mydate(D) {
x <- replicate(dim(D)[0], get_year_month(..)
return(x)
}
来源:https://stackoverflow.com/questions/15316657/convert-date-to-year-month-representation-in-r