I have some data in R that I intend to analyze. However, the file is not displaying the data. Instead, It is only showing a variable in the data. The following is the procedure
There are two ways to save R objects, and you've got them mixed up. In the first way, you save()
any collection of objects in an environment to a file. When you load()
that file, those objects are re-created with their original names in your current environment. This is how R saves and resotres workspaces.
The second way stores (serializes) a single R object into a file with the saveRDS()
function, and recreates it in your environment with the readRDS()
function. If you don't assign the results of readRDS()
, it'll just print to your screen and drift away.
Examples below:
# Make a simple dataframe
testdf <- data.frame(x = 1:10,
y = rnorm(10))
# Save it out using the save() function
savedir <- tempdir()
savepath <- file.path(savedir, "saved.Rdata")
save(testdf, file = savepath)
# Delete it
rm(testdf)
# Load without assigning - and it's back in your environment
load(savepath)
testdf
# But if you assign the results of load, you just get the name of the object
wrong <- load(savepath)
wrong
# Compare with the RDS:
rds_path <- file.path(savedir, "testdf.rds")
saveRDS(testdf, file = rds_path)
rm(testdf)
testdf <- readRDS(file = rds_path)
testdf
Why the two different approaches? The save()
-environment approach is good for creating a checkpoint of your entire environment that you can restore later - that's what R uses it for - but that's about it. It's too easy for such an environment to get cluttered, and if an object you load()
has the same name as an object in your current environment, it will overwrite that object:
testdf$z <- "blah"
load(savepath)
testdf # testdf$z is gone
The RDS method lets you assign the name on read, as you're looking to do here. It's a little more annoying to save multiple objects, sure, but you probably shouldn't be saving objects very often anyway - recreating objects from scratch is the best way to ensure that your R code does what you think it does.
If you read ?help, it says that the return value of load
is:
A character vector of the names of objects created, invisibly.
This suggests (but admittedly does not state) that the true work of the load
command is by side-effect, in that it inserts the objects into an environment (defaulting to the current environment, often but not always .GlobalEnv
). You should immediately have access to them from where you called load(...)
.
For instance, if I can guess at variables you might have in your rda
file:
x
# Error: object 'x' not found
# either one of these on windows, NOT BOTH
dat = load("C:\\Users\\user\\AppData\\Local\\Temp\\1_29_923-Macdonell.RData")
dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData")
dat
# [1] "x" "y" "z"
x
# [1] 42
If you want them to be not stored in the current environment, you can set up an environment to place them in. (I use parent=emptyenv()
, but that's not strictly required. There are some minor ramifications to not including that option, none of them earth-shattering.)
myenv <- new.env(parent = emptyenv())
dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData",
envir = myenv)
dat
# [1] "x" "y" "z"
x
# Error: object 'x' not found
ls(envir = myenv)
# [1] "x" "y" "z"
From here you can get at your data in any number of ways:
ls.str(myenv) # similar in concept to str() but for environments
# x : num 42
# y : num 1
# z : num 2
myenv$x
# [1] 42
get("x", envir = myenv)
# [1] 42
Side note:
You may have noticed that I used dat
as my variable name instead of data
. Though you are certainly allowed to use that, it can bite you if you use variable names that match existing variables or functions. For instance, all of your code will work just fine as long as you load your data. If, however, you run some of your code without pre-loading your objects into your data
variable, you'll likely get an error such as:
mean(data$x)
# Error in data$x : object of type 'closure' is not subsettable
That error message is not immediately self-evident. The problem is that if not previously defined as in your question, then data
here refers to the function data. In programming terms, a closure is a special type of function, so the error really should have said:
# Error in data$x : object of type 'function' is not subsettable
meaning that though dat
can be subsetted and dat$x
means something, you cannot use the $
subset method on a function itself. (You can't do mean$x
when referring to the mean
function, for example.) Regardless, even though this here-modified error message is less confusing, it is still not clearly telling you what/where the problem is located.
Because of this, many seasoned programmers will suggest you use unique variable names (perhaps more than just x
:-). If you use my suggestion and name it dat
instead, then the mistake of not preloading your data will instead error with:
mean(dat$x)
# Error in mean(dat$x) : object 'dat' not found
which is a lot more meaningful and easier to troubleshoot.