I have a data frame (all_data
) in which I have a list of sites (1... to n) and their scores e.g.
site score
1 10
1 11
Two other options:
1) Using the .GRP
function from the data.table
package:
library(data.table)
setDT(dat)[, num := .GRP, by = site]
with the example dataset from below this results in:
> dat
site score num
1: 1 0.14945795 1
2: 1 0.60035697 1
3: 1 0.94643075 1
4: 8 0.68835336 2
5: 8 0.50553372 2
6: 8 0.37293624 2
7: 4 0.33580504 3
8: 4 0.04825135 3
9: 4 0.61894754 3
10: 8 0.96144729 2
11: 8 0.65496051 2
12: 8 0.51029199 2
2) Using the group_indices
function from dplyr
:
dat$num <- group_indices(dat, site)
or when you want to work around non-standard evaluation:
library(dplyr)
dat %>%
mutate(num = group_indices_(dat, .dots = c('site')))
which results in:
site score num
1 1 0.42480366 1
2 1 0.98736177 1
3 1 0.35766187 1
4 8 0.06243182 3
5 8 0.55617002 3
6 8 0.20304632 3
7 4 0.90855921 2
8 4 0.25215078 2
9 4 0.44981251 2
10 8 0.60288270 3
11 8 0.46946587 3
12 8 0.44941782 3
As can be seen, dplyr
gives a different order of the group numbers.
If you want another number every time the group changes, there are several other options:
1) with base R:
# option 1:
dat$num <- cumsum(c(TRUE, head(dat$site, -1) != tail(dat$site, -1)))
# option 2:
x <- rle(dat$site)$lengths
dat$num <- rep(seq_along(x), times=x)
2) with the data.table
package:
library(data.table)
setDT(dat)[, num := rleid(site)]
which all result in:
> dat
site score num
1 1 0.80817855 1
2 1 0.07881334 1
3 1 0.60092828 1
4 8 0.71477988 2
5 8 0.51384565 2
6 8 0.72011650 2
7 4 0.74994627 3
8 4 0.09564052 3
9 4 0.39782587 3
10 8 0.29446540 4
11 8 0.61725367 4
12 8 0.97427413 4
Used data:
dat <- data.frame(site = rep(c(1,8,4,8), each = 3), score = runif(12))