Using ifelse with transform in ddply

后端 未结 2 468
既然无缘
既然无缘 2021-01-19 04:15

I am trying to use ddply with transform to populate a new variable (summary_Date) in a dataframe with variables ID and

相关标签:
2条回答
  • 2021-01-19 04:55
    # transform to data.table
    library(data.table)
    test.dt <- data.table(test.df)
    
    # calculate length of id by month-year. 
    test.dt[, idlen := length(ID),  by=list(month(Date), year(Date)) ]
    
    # calculate the summary date
    test.dt[, summary_Date := ifelse(idlen<5, as.Date(round_date(Date, "month")), as.Date(Date))]
    
    # If you would like to have it formatted add the following: 
    test.dt[, summary_Date := as.Date(summary_Date, origin="1970-01-01")]
    

    Results:

     > test.dt
        ID                Date         Val idlen summary_Date
     1:  1 1962-03-01 12:00:00  0.42646422     3   1962-03-01
     2:  1 1962-03-14 12:00:00 -0.29507148     3   1962-03-01
     3:  1 1962-03-27 12:00:00  0.89512566     3   1962-04-01   <~~~~~
     4:  1 1962-04-10 12:00:00  0.87813349     2   1962-04-01
     5:  1 1962-04-24 12:00:00  0.82158108     2   1962-05-01
     6:  1 1962-05-08 12:00:00  0.68864025     1   1962-05-01
    


    UPDATE:

    Explanation of why two steps are needed

    The reason it cannot be done in one step has to do with the fact that you are only getting a single value per group. When you assign that value to the members of the group, you are assigning 1 element to many. R knows how to handle such situations very well: recycling the single element.

    However, in this specifica case, you do not want to recycle; Rather, you do not want to apply the 1 element to many. Therefore, you need unique groups, which is what we do in the second step. Each element (row) of the group then gets assigned its own, specific value.

    UPDATE 2:

    @Ramnath gave a great suggestion of using mutate. Taking a look at ?mutate, it gives:

    This function is very similar to transform but it executes the transformations iteratively ... later transformations can use the columns created by earlier transformations

    Which is exactly what you want to do!

    0 讨论(0)
  • 2021-01-19 04:58

    One Step ddply solution (also posted as comment)

    ddply(test.df, .(ID, floor_date(Date, "month")), mutate, 
      length_x = length(ID), 
      summary_Date=as.POSIXct(ifelse(length_x < 5, round_date(Date, "month") ,Date)
        , origin="1970-01-01 00:00.00", tz="GMT")
    )
    
    0 讨论(0)
提交回复
热议问题