I am trying to use ddply
with transform
to populate a new variable (summary_Date
) in a dataframe with variables ID
and
# transform to data.table
library(data.table)
test.dt <- data.table(test.df)
# calculate length of id by month-year.
test.dt[, idlen := length(ID), by=list(month(Date), year(Date)) ]
# calculate the summary date
test.dt[, summary_Date := ifelse(idlen<5, as.Date(round_date(Date, "month")), as.Date(Date))]
# If you would like to have it formatted add the following:
test.dt[, summary_Date := as.Date(summary_Date, origin="1970-01-01")]
> test.dt
ID Date Val idlen summary_Date
1: 1 1962-03-01 12:00:00 0.42646422 3 1962-03-01
2: 1 1962-03-14 12:00:00 -0.29507148 3 1962-03-01
3: 1 1962-03-27 12:00:00 0.89512566 3 1962-04-01 <~~~~~
4: 1 1962-04-10 12:00:00 0.87813349 2 1962-04-01
5: 1 1962-04-24 12:00:00 0.82158108 2 1962-05-01
6: 1 1962-05-08 12:00:00 0.68864025 1 1962-05-01
The reason it cannot be done in one step has to do with the fact that you are only getting a single value per group. When you assign that value to the members of the group, you are assigning 1 element to many. R
knows how to handle such situations very well: recycling
the single element.
However, in this specifica case, you do not want to recycle; Rather, you do not want to apply the 1
element to many
. Therefore, you need unique groups, which is what we do in the second step. Each element (row) of the group then gets assigned its own, specific value.
@Ramnath gave a great suggestion of using mutate
. Taking a look at ?mutate
, it gives:
This function is very similar to transform but it executes the transformations iteratively ... later transformations can use the columns created by earlier transformations
Which is exactly what you want to do!
One Step ddply
solution (also posted as comment)
ddply(test.df, .(ID, floor_date(Date, "month")), mutate,
length_x = length(ID),
summary_Date=as.POSIXct(ifelse(length_x < 5, round_date(Date, "month") ,Date)
, origin="1970-01-01 00:00.00", tz="GMT")
)