I\'m struggling with something very basic: sorting a data frame based on a time format (month-year, or, “%B-%y” in this case). My goal is to calculate various monthly statis
It would be easier to have separate Month
and Year
factors, in the correct order, and use tapply
on the union of both variables, e.g.:
## The Month factor
tmp09 <- within(tmp09,
Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
levels = month.name)))
## for @Jura25's locale, we can't use the in built English constant
## instead, we can use this solution, from ?month.name:
## format(ISOdate(2000, 1:12, 1), "%B"))
tmp09 <- within(tmp09,
Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
levels = format(ISOdate(2000, 1:12, 1), "%B"))))
##
## And the Year factor
tmp09 <- within(tmp09, Year <- factor(strftime(ExitTime, format = "%Y")))
Which gives us (in my locale):
> head(tmp09)
Instrument AccountValue monthYear ExitTime Month Year
1 JPM 6997 april-07 2007-04-10 April 2007
2 JPM 7261 mei-07 2007-05-29 May 2007
3 JPM 7545 juli-07 2007-07-18 July 2007
4 JPM 7614 juli-07 2007-07-19 July 2007
5 JPM 7897 augustus-07 2007-08-22 August 2007
10 JPM 7423 november-07 2007-11-02 November 2007
Then use tapply
with both factors:
> with(tmp09, tapply(AccountValue, list(Month, Year), sum))
2007
April 6997
May 21197
July 29147
August 7897
November 7423
or via aggregate
:
> with(tmp09, aggregate(AccountValue, list(Month = Month, Year = Year), sum))
Month Year x
1 April 2007 6997
2 May 2007 21197
3 July 2007 29147
4 August 2007 7897
5 November 2007 7423
Edit: I misunderstood the question at first. Copy the data given in the question first, then
> tmp09 <- read.table(file="clipboard", header=TRUE)
> Sys.setlocale(category="LC_TIME", locale="Dutch_Belgium.1252")
[1] "Dutch_Belgium.1252"
# create POSIXlt variable from monthYear
> tmp09$d <- strptime(paste("2007", tmp09$monthYear, sep="-"), "%Y-%B-%d")
# create ordered factor
> tmp09$dFac <- droplevels(cut(tmp09$d, breaks="month", ordered=TRUE))
> tmp09[order(tmp09$d), ]
Instrument AccountValue monthYear ExitTime d dFac
1 JPM 6997 april-07 2007-04-10 2007-04-07 2007-04-01
2 JPM 7261 mei-07 2007-05-29 2007-05-07 2007-05-01
11 KFT 6992 mei-07 2007-05-14 2007-05-07 2007-05-01
12 KFT 6944 mei-07 2007-05-21 2007-05-07 2007-05-01
3 JPM 7545 juli-07 2007-07-18 2007-07-07 2007-07-01
4 JPM 7614 juli-07 2007-07-19 2007-07-07 2007-07-01
13 KFT 7069 juli-07 2007-07-09 2007-07-07 2007-07-01
14 KFT 6919 juli-07 2007-07-16 2007-07-07 2007-07-01
5 JPM 7897 augustus-07 2007-08-22 2007-08-07 2007-08-01
10 JPM 7423 november-07 2007-11-02 2007-11-07 2007-11-01
> Tmp09Totals <- tapply(tmp09$AccountValue, tmp09$dFac, sum)
> Tmp09Totals
2007-04-01 2007-05-01 2007-07-01 2007-08-01 2007-11-01
6997 21197 29147 7897 7423
Try using the "yearmon"
class in zoo as it sorts appropriately. Below we create the sample DF
data frame and then we add a YearMonth
column of class "yearmon"
. Finally we perform our aggregation. The actual processing is just the last two lines (the other part is just to create the sample data frame).
Lines <- "Instrument AccountValue monthYear ExitTime
JPM 6997 april-07 2007-04-10
JPM 7261 mei-07 2007-05-29
JPM 7545 juli-07 2007-07-18
JPM 7614 juli-07 2007-07-19
JPM 7897 augustus-07 2007-08-22
JPM 7423 november-07 2007-11-02
KFT 6992 mei-07 2007-05-14
KFT 6944 mei-07 2007-05-21
KFT 7069 juli-07 2007-07-09
KFT 6919 juli-07 2007-07-16"
library(zoo)
DF <- read.table(textConnection(Lines), header = TRUE)
DF$YearMonth <- as.yearmon(DF$ExitTime)
aggregate(AccountValue ~ YearMonth + Instrument, DF, sum)
This gives the following:
> aggregate(AccountValue ~ YearMonth + Instrument, DF, sum)
YearMonth Instrument AccountValue
1 Apr 2007 JPM 6997
2 May 2007 JPM 7261
3 Jul 2007 JPM 15159
4 Aug 2007 JPM 7897
5 Nov 2007 JPM 7423
6 May 2007 KFT 13936
7 Jul 2007 KFT 13988
A slightly different approach and output uses read.zoo
directly. It produces one column per instrument and one row per year/month. We read in the columns assigning them appropriate classes using "NULL"
for the monthYear
column since we won't use that one. We also specify that the time index is the 3rd column of the remaining columns and that we want the input split into columns by the 1st column. FUN=as.yearmon
indicates that we want the time index to be converted from "Date"
class to "yearmon"
class and we aggregate everything using sum
.
z <- read.zoo(textConnection(Lines), header = TRUE, index = 3,
split = 1, colClasses = c("character", "numeric", "NULL", "Date"),
FUN = as.yearmon, aggregate = sum)
The resulting zoo object looks like this:
> z
JPM KFT
Apr 2007 6997 NA
May 2007 7261 13936
Jul 2007 15159 13988
Aug 2007 7897 NA
Nov 2007 7423 NA
We may prefer to keep it as a zoo object to take advantage of other functionality in zoo or we can convert it to a data frame like this: data.frame(Time = time(z), coredata(z))
which makes the time a separate column or as.data.frame(z)
which uses row names for the time. fortify.zoo()z)
also works.
It looks like the main problem is how to sort a sequence of Month-Year strings chronologically. The easiest way is to pre-pend a "01" at the beginning of each Month-Year string and sort them as regular dates. So take your final data-frame Tmp09Totals, and do this:
monYear <- rownames(Tmp09Totals)
sortedMonYear <- format(sort( as.Date( paste('01-', monYear, sep = ''),
'%d-%B-%y')),
'%B-%y')
Tmp09Totals[ sortedMonYear, , drop = FALSE]
You could reorder factor levels by reorder
function.
tmp09$monthYear <- reorder(tmp09$monthYear, as.numeric(as.Date(tmp09$ExitTime)))
Trick is to use numeric representation of date as number of days since 1970-01-01 (see ?Date
) and use mean value of it as reference.
An old post but worthy of a data.table
approach:
Read in data and set local as described by @caracal
> Sys.setlocale(category="LC_TIME", locale="Dutch_Belgium.1252")
[1] "Dutch_Belgium.1252"
> tmp09 <- read.table(file="clipboard", header=TRUE)
> tmp09$ExitTime <- as.Date(tmp09$ExitTime)
Summarise data as requested
require(data.table)
> data.table(tmp09)[,
+ .(Tmp09Total = sum(AccountValue)),
+ by = .(Date = format(ExitTime, "%B-%y"))]
Date Tmp09Total
1: april-07 6997
2: mei-07 21197
3: juli-07 29147
4: augustus-07 7897
5: november-07 7423