I am new to R and I am an avid SAS programmer and am just having a difficult time wrapping my head around R.
Within a data frame I have a date time column formatted as
Use the lubridate
package. For example, if df
is a data.frame
with a column dt
of type POSIXct
, then you could:
df$date = as.Date(as.POSIXct(df$dt, tz="UTC"))
df$year = year(df$dt)
df$month = month(df$dt)
df$day = day(df$dt)
# and so on...
If your can store your data in a data.table
, then this is even easier:
df[, `:=`(date = as.Date(as.POSIXct(dt, tz="UTC")), year = year(dt), ...)]
It is wise to always to be careful with as.Date(as.POSIXct(...))
:
E.g., for me in Australia:
df <- data.frame(dt=as.POSIXct("2013-01-01 00:53:00"))
df
# dt
#1 2013-01-01 00:53:00
as.Date(df$dt)
#[1] "2012-12-31"
You'll see that this is problematic as the dates don't match. You'll hit problems if your POSIXct
object is not in the UTC
timezone as as.Date
defaults to tz="UTC"
for this class. See here for more info: as.Date(as.POSIXct()) gives the wrong date?
To be safe you probably need to match your timezones:
as.Date(df$dt,tz=Sys.timezone()) #assuming you've just created df in the same session:
#[1] "2013-01-01"
Or safer option #1:
df <- data.frame(dt=as.POSIXct("2013-01-01 00:53:00",tz="UTC"))
as.Date(df$dt)
#[1] "2013-01-01"
Or safer option #2:
as.Date(df$dt,tz=attr(df$dt,"tzone"))
#[1] "2013-01-01"
Or alternatively use format
to extract parts of the POSIXct
object:
as.Date(format(df$dt,"%Y-%m-%d"))
#[1] "2013-01-01"
as.numeric(format(df$dt,"%Y"))
#[1] 2013
as.numeric(format(df$dt,"%m"))
#[1] 1
as.numeric(format(df$dt,"%d"))
#[1] 1