What are the advantages of placing data in a new .env in R?-speed, etc.
For data such as time series, is an new .env analogous to a database?
My question spans initally from downloading asset prices in R where it was suggested to place them into a new .env. Why is this so? Thank you:
library(TTR)
url = paste('http://www.nasdaq.com/markets/indices/nasdaq-100.aspx',sep="")
txt = join(readLines(url))
# extract tables from this pages
temp = extract.table.from.webpage(txt, 'Symbol', hasHeader = T)
temp[,2]
# Symbols
symbols = c(temp[,2])[2:101]
currency("USD")
stock(symbols, currency = "USD", multiplier = 1)
# create new environment to store symbols
symEnv <- new.env()
# getSymbols and assign the symbols to the symEnv environment
getSymbols(symbols, from = '2002-09-01', to = '2013-10-17', env = symEnv)
There are advantages to this if your data is large and you have to modify it by passing it through functions. When you send data.frame
s or vector
s to functions that modify them, R will make a copy of the data before making changes to it. You'd then return the modified data from the function and overwrite the old data to complete the modification step.
If your data is large, copying the data for each function call may result in an undesirable amount of overhead. Using environment
s provides a way around this overhead. environment
s are handled differently by functions. If you pass an environment
to a function and modify the contents, R will operate directly on the environment
without making a copy of it. So by putting your data in an environment
and passing the environment
to the function instead of directly passing the data, you can avoid copying the large dataset.
# here I create a data.frame inside an environment and pass the environment
# to a function that modifies the data.
e <- new.env()
e$k <- data.frame(a=1:3)
f <- function(e) {e$k[1,1] <- 10}
f(e)
# you can see that the original data was changed.
e$k
a
1 10
2 2
3 3
# alternatively, if I pass just the data.frame, the manipulations do not affect the
# original data.
k <- data.frame(a=1:3)
f2 <- function(k) {k[1,1] <- 10}
f2(k)
k
a
1 1
2 2
3 3
Lets compare two cases. With new environment:
e <- new.env()
e$k <- data.frame(a=1:1000000)
f <- function(e) {e$k[1,1] <- 10}
system.time({
for(i in 1:1000) f(e)
})
head(e$k)
user system elapsed
5.32 6.35 11.67
Without new environment:
k <- data.frame(a=1:1000000)
f <- function(e) {e[1,1] <- 10;return(e);}
system.time({
for(i in 1:1000) k <- f(k)
})
user system elapsed
5.07 6.82 11.89
not much of a difference...
来源:https://stackoverflow.com/questions/19772091/what-are-the-advantages-of-placing-data-in-a-new-env-in-r