How to generate auto-incrementing ID in R

后端 未结 2 413
一整个雨季
一整个雨季 2020-12-29 11:42

I am looking for an efficient way to create unique, numeric IDs for some synthetic data I\'m generating.

Right now, I simply have a function that emits and incremen

相关标签:
2条回答
  • 2020-12-29 11:54

    I like to use the proto package for small OO programming. Under the hood, it uses environments in a similar fashion to what Martin Morgan illustrated.

    # this defines your class
    library(proto)
    Counter <- proto(idCounter = 0L)
    Counter$emitID <- function(self = .) {
       id <- formatC(self$idCounter, width = 9, flag = 0, format = "d")
       self$idCounter <- self$idCounter + 1L
       return(id)
    }
    
    # This creates an instance (or you can use `Counter` directly as a singleton)
    mycounter <- Counter$proto()
    
    # use it:
    mycounter$emitID()
    # [1] "000000000"
    mycounter$emitID()
    # [1] "000000001"
    
    0 讨论(0)
  • 2020-12-29 12:01

    A non-global version of the counter uses lexical scope to encapsulate idCounter with the increment function

    emitID <- local({
        idCounter <- -1L
        function(){
            idCounter <<- idCounter + 1L                     # increment
            formatC(idCounter, width=9, flag=0, format="d")  # format & return
        }
    })
    

    and then

    > emitID()
    [1] "000000000"
    > emitID1()
    [1] "000000001"
    > idCounter <- 123   ## global variable, not locally scoped idCounter
    > emitID()
    [1] "000000002"
    

    A fun alternative is to use a 'factory' pattern to create independent counters. Your question implies that you'll call this function a billion (hmm, not sure where I got that impression...) times, so maybe it makes sense to vectorize the call to formatC by creating a buffer of ids?

    idFactory <- function(buf_n=1000000) {
        curr <- 0L
        last <- -1L
        val <- NULL
        function() {
            if ((curr %% buf_n) == 0L) {
                val <<- formatC(last + seq_len(buf_n), width=9, flag=0, format="d")
                last <<- last + buf_n
                curr <<- 0L
            }
            val[curr <<- curr + 1L]
        }
    }
    emitID2 <- idFactory()
    

    and then (emitID1 is an instance of the local variable version above).

    > library(microbenchmark)
    > microbenchmark(emitID1(), emitID2(), times=100000)
    Unit: microseconds
          expr    min     lq median     uq      max neval
     emitID1() 66.363 70.614 72.310 73.603 13753.96 1e+05
     emitID2()  2.240  2.982  4.138  4.676 49593.03 1e+05
    > emitID1()
    [1] "000100000"
    > emitID2()
    [1] "000100000"
    

    (the proto solution is about 3x slower than emitID1, though speed is not everything).

    0 讨论(0)
提交回复
热议问题