I am using the fasttime package for its fastPOSIXct
function that can read character datetimes very efficiently. My problem is that it can only read character datet
Could you not just add the appropriate number of seconds to correct the offset from GMT?
# Original problem
fastPOSIXct("2010-03-15 12:37:17.223",tz="America/Montreal")
# [1] "2010-03-15 08:37:17 EDT"
# Add 4 hours worth of seconds to the data. This should be very quick.
fastPOSIXct("2010-03-15 12:37:17.223",tz="America/Montreal") + 14400
# [1] "2010-03-15 12:37:17 EDT"
The smart thing to do here is almost certainly to write readable, easy-to-maintain code, and throw more hardware at the problem if your code is too slow.
If you are desperate for a code speedup, then you could write a custom time-zone adjustment function. It isn't pretty, so if you have to convert between many time zones, you'll end up with spaghetti code. Here's my solution for the specific case of converting from GMT to Montreal time.
First precompute a list of dates for daylight savings time. You'll need to extend this to before 2010/after 2013 in order to fit your dataset. I found the dates here
http://www.timeanddate.com/worldclock/timezone.html?n=165
montreal_tz_data <- cbind(
start = fastPOSIXct(
c("2010-03-14 07:00:00", "2011-03-13 07:00:00", "2012-03-11 07:00:00", "2013-03-10 07:00:00")
),
end = fastPOSIXct(
c("2010-11-07 06:00:00", "2011-11-06 06:00:00", "2012-11-04 06:00:00", "2013-11-03 06:00:00")
)
)
For speed, the function to change time zones treats the times as numbers.
to_montreal_tz <- function(x)
{
x <- as.numeric(x)
is_dst <- logical(length(x)) #initialise as FALSE
#Loop over DST periods in each year
for(row in seq_len(nrow(montreal_tz_data)))
{
is_dst[x > montreal_tz_data[row, 1] & x < montreal_tz_data[row, 2]] <- TRUE
}
#Hard-coded numbers are 4/5 hours in seconds
ans <- ifelse(is_dst, x + 14400, x + 18000)
class(ans) <- c("POSIXct", "POSIXt")
ans
}
Now, to compare times:
#A million dates
ch <- rep("2010-03-15 12:37:17.223", 1e6)
#The easy way (no conversion of time zones afterwards)
system.time(as.POSIXct(ch, tz="America/Montreal"))
# user system elapsed
# 28.96 0.05 29.00
#A slight performance gain by specifying the format
system.time(as.POSIXct(ch, format = "%Y-%m-%d %H:%M:%S", tz="America/Montreal"))
# user system elapsed
# 13.77 0.01 13.79
#Using the fast functions
library(fasttime)
system.time(to_montreal_tz(fastPOSIXct(ch)))
# user system elapsed
# 0.51 0.02 0.53
As with all optimisation tricks, you've either got a 27-fold speedup (yay!) or you've saved 13 seconds processing time but added 3 days of code-maintenance time from an obscure bug when you DST table runs out in 2035 (boo!).
It's a daylight savings issue: http://www.timeanddate.com/time/dst/2010a.html
In 2010 it began on the 14th March in Canada, but not until the 28th March in the UK.
You can use POSIXlt
objects to modify timezones directly:
lt <- as.POSIXlt(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"))
attr(lt,"tzone") <- "America/Montreal"
as.POSIXct(lt)
[1] "2010-03-15 12:37:17 EDT"
Or you could use format
to convert to a string and set the timezone in a call to as.POSIXct
. You can therefore modify forceTZ
:
forceTZ <- function(x,tz)
{
return(as.POSIXct(format(x),tz=tz))
}
forceTZ(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"),"America/Montreal")
[1] "2010-03-15 12:37:17 EDT"