How to create a consecutive group number

后端 未结 8 1076
深忆病人
深忆病人 2020-11-21 16:13

I have a data frame (all_data) in which I have a list of sites (1... to n) and their scores e.g.

  site  score
     1    10
     1    11  
              


        
相关标签:
8条回答
  • 2020-11-21 17:01

    Try Data$number <- as.numeric(as.factor(Data$site))

    On a sidenote : the difference between the solution of me and @Chase on one hand, and the one of @DWin on the other, is the ordering of the numbers. Both as.factor and factor will automatically sort the levels, whereas that doesn't happen in the solution of @DWin :

    Dat <- data.frame(site = rep(c(1,8,4), each = 3), score = runif(9))
    
    Dat$number <- as.numeric(factor(Dat$site))
    Dat$sitenum <- match(Dat$site, unique(Dat$site) ) 
    

    Gives

    > Dat
      site     score number sitenum
    1    1 0.7377561      1       1
    2    1 0.3131139      1       1
    3    1 0.7862290      1       1
    4    8 0.4480387      3       2
    5    8 0.3873210      3       2
    6    8 0.8778102      3       2
    7    4 0.6916340      2       3
    8    4 0.3033787      2       3
    9    4 0.6552808      2       3
    
    0 讨论(0)
  • 2020-11-21 17:04

    Two other options:

    1) Using the .GRP function from the data.table package:

    library(data.table)
    setDT(dat)[, num := .GRP, by = site]
    

    with the example dataset from below this results in:

    > dat
        site      score num
     1:    1 0.14945795   1
     2:    1 0.60035697   1
     3:    1 0.94643075   1
     4:    8 0.68835336   2
     5:    8 0.50553372   2
     6:    8 0.37293624   2
     7:    4 0.33580504   3
     8:    4 0.04825135   3
     9:    4 0.61894754   3
    10:    8 0.96144729   2
    11:    8 0.65496051   2
    12:    8 0.51029199   2
    

    2) Using the group_indices function from dplyr:

    dat$num <- group_indices(dat, site)
    

    or when you want to work around non-standard evaluation:

    library(dplyr)
    dat %>% 
      mutate(num = group_indices_(dat, .dots = c('site')))
    

    which results in:

       site      score num
    1     1 0.42480366   1
    2     1 0.98736177   1
    3     1 0.35766187   1
    4     8 0.06243182   3
    5     8 0.55617002   3
    6     8 0.20304632   3
    7     4 0.90855921   2
    8     4 0.25215078   2
    9     4 0.44981251   2
    10    8 0.60288270   3
    11    8 0.46946587   3
    12    8 0.44941782   3
    

    As can be seen, dplyr gives a different order of the group numbers.


    If you want another number every time the group changes, there are several other options:

    1) with base R:

    # option 1:
    dat$num <- cumsum(c(TRUE, head(dat$site, -1) != tail(dat$site, -1)))
    
    # option 2:
    x <- rle(dat$site)$lengths
    dat$num <- rep(seq_along(x), times=x)
    

    2) with the data.table package:

    library(data.table)
    setDT(dat)[, num := rleid(site)]
    

    which all result in:

    > dat
       site      score num
    1     1 0.80817855   1
    2     1 0.07881334   1
    3     1 0.60092828   1
    4     8 0.71477988   2
    5     8 0.51384565   2
    6     8 0.72011650   2
    7     4 0.74994627   3
    8     4 0.09564052   3
    9     4 0.39782587   3
    10    8 0.29446540   4
    11    8 0.61725367   4
    12    8 0.97427413   4
    

    Used data:

    dat <- data.frame(site = rep(c(1,8,4,8), each = 3), score = runif(12))
    
    0 讨论(0)
提交回复
热议问题