Efficiently adding or removing elements to a vector or list in R?

前端 未结 4 769
予麋鹿
予麋鹿 2020-12-31 06:22

I\'m implementing an algorithm that involves lots of adding and removing things from sets. In R, this is slow because as far as I know, adding or removing things from a vect

相关标签:
4条回答
  • 2020-12-31 06:57

    It's hard to say what you want. Maybe you really want stack commands like push and pop. The following isn't that. But it's a fast solution.

    Allocate a vector large enough to hold all of your items of the type you need. Set every value to NA. Adding items is simple. Removing items is setting them to NA again. Using the vector is just na.omit(myVec)

    myVec <- numeric (maxLength)  # a vector of maximum length
    
    is.na(myVec) <- 1:maxLength   # set every item in myVec to NA
    
    myVec[c(2,6,20)] <- 5         # add some values
    
    na.omit(myVec)
    
    #This will also work if you can initialize all of your values to something that you know you won't need. 
    
    0 讨论(0)
  • 2020-12-31 07:00

    Yes, there are more efficient ways.

    It comes down to how you're using the data; your use case. Are you taking out the data in in the same order you put it in, or in reversed order, or in random order, or in a sorted order?

    For FIFO, for a fixed size array use a circular buffer, or for a fully dynamic size, use a deque (pronounced deck). (This is probably what you want.)

    For FILO, use a stack.

    For grabbing the data at random, consider using a 1 column matrix you never resize. Resizing is slow.

    If you need an ordered set (eg c(3,2,5) -> c(2,3,5)), look into a tree or a heap.

    • A tree is good for comparisons, eg grabbing all elements in a set that is < 5.5.
    • A heap is good for grabbing just the largest or smallest elements.
    0 讨论(0)
  • 2020-12-31 07:03

    Chapter 2 of The R Inferno has some interesting comments on this, including perdiodic growing objects to reduce memory fragmentation and allocation overhead.

    If you know what the ultimate size of the set is, then the method you suggest is probably the best - ie subset from the whole universe using an approprate membership vector. Difficult to know whats best without seeing exactly what you are trying to do though.

    0 讨论(0)
  • 2020-12-31 07:06

    If you can, initializing a vector so that it has length equal to its maximum length during the algorithm may help.

    e.g.

    vec <- rep(NA,10)
    vec[1:3] <- 1:3
    vec[4:5] <- 4:5
    vec[6:10] <- 6:10
    

    rather than

    vec <- 1:3
    vec <- c(vec,4:5)
    vec <- c(vec,6:10)
    

    compare

    > system.time({vec <- rep(NA,10^4); for (i in 1:(10^4)) vec[i] <- i })
       user  system elapsed 
      0.043   0.001   0.044 
    

    to

    > system.time({vec <- NULL; for (i in 1:(10^4)) vec <- c(vec,i) })
       user  system elapsed 
      0.249   0.089   0.335
    
    0 讨论(0)
提交回复
热议问题