Using data.table
I can do the following:
library(data.table)
dt = data.table(a = 1:2, b = c(1,2,NA,NA))
# a b
#1: 1 1
#2: 2 2
#3: 1 NA
#4:
In the current development version of dplyr (which will eventually become dplyr 0.2) the behaviour differs between data frames and data tables:
library(dplyr)
library(data.table)
df <- data.frame(a = 1:2, b = c(1,2,NA,NA))
dt <- data.table(df)
df %.% group_by(a) %.% mutate(b = b[1])
## Source: local data frame [4 x 2]
## Groups: a
##
## a b
## 1 1 1
## 2 2 2
## 3 1 1
## 4 2 2
dt %.% group_by(a) %.% mutate(b = b[1])
## Source: local data table [4 x 2]
## Groups: a
##
## a b
## 1 1 1
## 2 1 1
## 3 2 2
## 4 2 2
This happens because group_by()
applied to a data.table
automatically does setkey()
on the assumption that the index will make
future operations faster.
If there's a strong feeling that this is a bad default, I'm happy to change it.