I\'m struggling with something very basic: sorting a data frame based on a time format (month-year, or, “%B-%y” in this case). My goal is to calculate various monthly statis
Try using the "yearmon"
class in zoo as it sorts appropriately. Below we create the sample DF
data frame and then we add a YearMonth
column of class "yearmon"
. Finally we perform our aggregation. The actual processing is just the last two lines (the other part is just to create the sample data frame).
Lines <- "Instrument AccountValue monthYear ExitTime
JPM 6997 april-07 2007-04-10
JPM 7261 mei-07 2007-05-29
JPM 7545 juli-07 2007-07-18
JPM 7614 juli-07 2007-07-19
JPM 7897 augustus-07 2007-08-22
JPM 7423 november-07 2007-11-02
KFT 6992 mei-07 2007-05-14
KFT 6944 mei-07 2007-05-21
KFT 7069 juli-07 2007-07-09
KFT 6919 juli-07 2007-07-16"
library(zoo)
DF <- read.table(textConnection(Lines), header = TRUE)
DF$YearMonth <- as.yearmon(DF$ExitTime)
aggregate(AccountValue ~ YearMonth + Instrument, DF, sum)
This gives the following:
> aggregate(AccountValue ~ YearMonth + Instrument, DF, sum)
YearMonth Instrument AccountValue
1 Apr 2007 JPM 6997
2 May 2007 JPM 7261
3 Jul 2007 JPM 15159
4 Aug 2007 JPM 7897
5 Nov 2007 JPM 7423
6 May 2007 KFT 13936
7 Jul 2007 KFT 13988
A slightly different approach and output uses read.zoo
directly. It produces one column per instrument and one row per year/month. We read in the columns assigning them appropriate classes using "NULL"
for the monthYear
column since we won't use that one. We also specify that the time index is the 3rd column of the remaining columns and that we want the input split into columns by the 1st column. FUN=as.yearmon
indicates that we want the time index to be converted from "Date"
class to "yearmon"
class and we aggregate everything using sum
.
z <- read.zoo(textConnection(Lines), header = TRUE, index = 3,
split = 1, colClasses = c("character", "numeric", "NULL", "Date"),
FUN = as.yearmon, aggregate = sum)
The resulting zoo object looks like this:
> z
JPM KFT
Apr 2007 6997 NA
May 2007 7261 13936
Jul 2007 15159 13988
Aug 2007 7897 NA
Nov 2007 7423 NA
We may prefer to keep it as a zoo object to take advantage of other functionality in zoo or we can convert it to a data frame like this: data.frame(Time = time(z), coredata(z))
which makes the time a separate column or as.data.frame(z)
which uses row names for the time. fortify.zoo()z)
also works.