R for loop not working

后端 未结 3 1459
予麋鹿
予麋鹿 2020-12-22 08:41

I\'m trying to use R to find the max value of each day for 1 to n days. My issue is there are multiple values in each day. Heres my code. After I run it incorrect number of

相关标签:
3条回答
  • 2020-12-22 08:54

    Unlike other programming languages, in R it is considered good practice to avoid using for loops. Instead try something like:

    index <- sapply(Days, function(x) {
        which.max(Value)
    })
    theData[index, c("Day", "Time", "Value")]
    

    This means for each value of Days, find the maximum value of Value and return its index. Then you can select the rows and columns of interest.

    I recommend reading the help documentation for apply(), lapply(), sapply(), tapply(), mapply() (I'm probably forgetting one of them…) in and the plyr package.

    0 讨论(0)
  • 2020-12-22 08:58

    This is a base function approach:

    > do.call( rbind, lapply(split(dfrm, dfrm$Day), 
                             function (df) df[ which.max(df$Value), ] ) )
                  Day     Time Value
    20130310 20130310 09:30:00     5
    20130311 20130311 09:30:00    12
    

    To explain what's happening it's good to learn to read R functions from the inside out (since they are often built around each other.) You wanted lines from a dataframe, so you would either need to build a numeric or logical vector that spanned the number of rows, .... or you can take the route I did and break the problem up by Day. That's what split does with dataframes. Then within each dataframe I applied a function, which.max to just a single day's subset of the data. Since I only got the results back from lapply as a list of dataframes, I needed to squash them back together, and the typical method for doing so is do.call(rbind, ...).

    If I took the other route of making a vector for selection that applied to the whole dataframe I would use ave:

    > dfrm[ with(dfrm, ave(Value, Day, FUN=function(v) v==max(v) ) ) , ]
             Day     Time Value
    1   20130310 09:30:00     5
    1.1 20130310 09:30:00     5
    

    Huh? That's not right... What's the problem?

    with(dfrm, ave(Value, Day, FUN=function(v) v==max(v) ) )
    [1] 1 0 0 0 1 0 0 0
    

    So despite asking for a logical vector with the "==" function, I got conversion to a numeric vector, something I still don't understand. But converting to logical outside that result I succeed again:

    > dfrm[ as.logical( with(dfrm, ave(Value, Day, 
                                       FUN=function(v) v==max(v) ) ) ), ]
           Day     Time Value
    1 20130310 09:30:00     5
    5 20130311 09:30:00    12
    

    Also note that the ave function (unlike tapply or aggregate) requires that you offer the function as a named argument with FUN=function(.). That is a common error I make. If you see the "error message unique() applies only to vectors", it seems out of the blue, but means that ave tried to group an argument that it expected to be discrete and you gave it a function.

    0 讨论(0)
  • 2020-12-22 09:04

    Here is the solution using plyr package

    mydata<-structure(list(Day = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 
    3L), .Label = c("", "x", "y"), class = "factor"), Value = c(0L, 
    1L, 2L, 3L, 12L, 0L, 1L, 5L), Time = c(5L, 6L, 7L, 8L, 1L, 2L, 
    3L, 4L)), .Names = c("Day", "Value", "Time"), row.names = c(NA, 
    8L), class = "data.frame")
    library(plyr)
    ddply(mydata,.(Day),summarize,max.value=max(Value))
    
      Day max.value
    1   x         3
    2   y        12
    

    Updated1: If your day is say 10/02/2012 12:00:00 AM, then you need to use:

    mydata$Day<-with(mydata,as.Date(Day, format = "%m/%d/%Y"))
    ddply(mydata,.(Day),summarize,max.value=max(Value))
    

    Please see here for the example.

    Updated2: as per new data: If your day is like the one you updated, you don't need to do anything. You can just use the code as following:

        mydata1<-structure(list(Day = c(20130310L, 20130310L, 20130310L, 20130310L, 
        20130311L, 20130311L, 20130311L, 20130311L), Time = structure(c(1L, 
        2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("9:30:00", "9:31:00", 
        "9:32:00", "9:33:00"), class = "factor"), Value = c(5L, 1L, 2L, 
        3L, 12L, 0L, 1L, 5L)), .Names = c("Day", "Time", "Value"), class = "data.frame", row.names = c(NA, 
        -8L))
    
    
    
    ddply(mydata,.(Day),summarize,Time=Time[which.max(Value)],max.value=max(Value))
           Day    Time max.value
    1 20130310 9:30:00         5
    2 20130311 9:30:00        12
    

    If you want the time to appear in the output, then just use Time=Time[which.max(Value)] which gives the time at the maximum value.

    0 讨论(0)
提交回复
热议问题