Why does R store the loop variable/index/dummy in memory?

前端 未结 2 1930
粉色の甜心
粉色の甜心 2021-01-11 11:57

I\'ve noticed that R keeps the index from for loops stored in the global environment, e.g.:

for (ii in 1:5){ }

print(ii)
# [1] 5
<         


        
相关标签:
2条回答
  • 2021-01-11 12:56

    In order to do what you suggest, R would have to change the scoping rules for for loops. This will likely never happen because i'm sure there is code out there in packages that rely on it. You may not use the index after the for loop, but given that loops can break() at any time, the final iteration value isn't always known ahead of time. And having this as a global option again would cause problems with existing code in working packages.

    As pointed out, it's for more common to use sapply or lapply loops in R. Something like

    for(i in 1:4) {
       lm(data[, 1] ~ data[, i])
    }
    

    becomes

    sapply(1:4, function(i) {
       lm(data[, 1] ~ data[, i])
    })
    

    You shouldn't be afraid of functions in R. After all, R is a functional language.

    It's fine to use for loops for more control, but you will have to take care of removing the indexing variable with rm() as you've pointed out. Unless you're using a different indexing variable in each loop, i'm surprised that they are piling up. I'm also surprised that in your case, if they are data.tables, they they are adding additional memory since data.tables don't make deep copies by default as far as i know. The only memory "price" you would pay is a simple pointer.

    0 讨论(0)
  • 2021-01-11 12:59

    I agree with the comments above. Even if you have to use for loop (using just side effects, not functions' return values) it would be a good idea to structure your code in several functions and store your data in lists.

    However, there is a way to "hide" index and all temporary variables inside the loop - by calling the for function in a separate environment:

    do.call(`for`, alist(i, 1:3, {
      # ...
      print(i)
      # ... 
    }), envir = new.env())
    

    But ... if you could put your code in a function, the solution is more elegant:

    for_each <- function(x, FUN) {
      for(i in x) {
        FUN(i)
      }
    }
    
    for_each(1:3, print)
    

    Note that with using "for_each"-like construct you don't even see the index variable.

    0 讨论(0)
提交回复
热议问题