I\'m trying to use R to find the max value of each day for 1 to n days. My issue is there are multiple values in each day. Heres my code. After I run it incorrect number of
Unlike other programming languages, in R it is considered good practice to avoid using for loops. Instead try something like:
index <- sapply(Days, function(x) {
which.max(Value)
})
theData[index, c("Day", "Time", "Value")]
This means for each value of Days
, find the maximum value of Value
and return its index. Then you can select the rows and columns of interest.
I recommend reading the help documentation for apply()
, lapply()
, sapply()
, tapply()
, mapply()
(I'm probably forgetting one of them…) in and the plyr
package.
This is a base function approach:
> do.call( rbind, lapply(split(dfrm, dfrm$Day),
function (df) df[ which.max(df$Value), ] ) )
Day Time Value
20130310 20130310 09:30:00 5
20130311 20130311 09:30:00 12
To explain what's happening it's good to learn to read R functions from the inside out (since they are often built around each other.) You wanted lines from a dataframe, so you would either need to build a numeric or logical vector that spanned the number of rows, .... or you can take the route I did and break the problem up by Day
. That's what split
does with dataframes. Then within each dataframe I applied a function, which.max
to just a single day's subset of the data. Since I only got the results back from lapply
as a list of dataframes, I needed to squash them back together, and the typical method for doing so is do.call(rbind, ...)
.
If I took the other route of making a vector for selection that applied to the whole dataframe I would use ave
:
> dfrm[ with(dfrm, ave(Value, Day, FUN=function(v) v==max(v) ) ) , ]
Day Time Value
1 20130310 09:30:00 5
1.1 20130310 09:30:00 5
Huh? That's not right... What's the problem?
with(dfrm, ave(Value, Day, FUN=function(v) v==max(v) ) )
[1] 1 0 0 0 1 0 0 0
So despite asking for a logical vector with the "==" function, I got conversion to a numeric vector, something I still don't understand. But converting to logical outside that result I succeed again:
> dfrm[ as.logical( with(dfrm, ave(Value, Day,
FUN=function(v) v==max(v) ) ) ), ]
Day Time Value
1 20130310 09:30:00 5
5 20130311 09:30:00 12
Also note that the ave
function (unlike tapply
or aggregate
) requires that you offer the function as a named argument with FUN=function(.)
. That is a common error I make. If you see the "error message unique() applies only to vectors", it seems out of the blue, but means that ave
tried to group an argument that it expected to be discrete and you gave it a function.
Here is the solution using plyr package
mydata<-structure(list(Day = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L), .Label = c("", "x", "y"), class = "factor"), Value = c(0L,
1L, 2L, 3L, 12L, 0L, 1L, 5L), Time = c(5L, 6L, 7L, 8L, 1L, 2L,
3L, 4L)), .Names = c("Day", "Value", "Time"), row.names = c(NA,
8L), class = "data.frame")
library(plyr)
ddply(mydata,.(Day),summarize,max.value=max(Value))
Day max.value
1 x 3
2 y 12
Updated1: If your day is say 10/02/2012 12:00:00 AM, then you need to use:
mydata$Day<-with(mydata,as.Date(Day, format = "%m/%d/%Y"))
ddply(mydata,.(Day),summarize,max.value=max(Value))
Please see here for the example.
Updated2: as per new data: If your day is like the one you updated, you don't need to do anything. You can just use the code as following:
mydata1<-structure(list(Day = c(20130310L, 20130310L, 20130310L, 20130310L,
20130311L, 20130311L, 20130311L, 20130311L), Time = structure(c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("9:30:00", "9:31:00",
"9:32:00", "9:33:00"), class = "factor"), Value = c(5L, 1L, 2L,
3L, 12L, 0L, 1L, 5L)), .Names = c("Day", "Time", "Value"), class = "data.frame", row.names = c(NA,
-8L))
ddply(mydata,.(Day),summarize,Time=Time[which.max(Value)],max.value=max(Value))
Day Time max.value
1 20130310 9:30:00 5
2 20130311 9:30:00 12
If you want the time to appear in the output, then just use Time=Time[which.max(Value)]
which gives the time at the maximum value.