How to open an R data file in R window

后端 未结 2 1747
粉色の甜心
粉色の甜心 2021-01-24 06:58

I have some data in R that I intend to analyze. However, the file is not displaying the data. Instead, It is only showing a variable in the data. The following is the procedure

相关标签:
2条回答
  • 2021-01-24 08:00

    There are two ways to save R objects, and you've got them mixed up. In the first way, you save() any collection of objects in an environment to a file. When you load() that file, those objects are re-created with their original names in your current environment. This is how R saves and resotres workspaces.

    The second way stores (serializes) a single R object into a file with the saveRDS() function, and recreates it in your environment with the readRDS() function. If you don't assign the results of readRDS(), it'll just print to your screen and drift away.

    Examples below:

    # Make a simple dataframe
    testdf <- data.frame(x = 1:10,
                         y = rnorm(10))
    
    # Save it out using the save() function
    savedir <- tempdir()
    savepath <- file.path(savedir, "saved.Rdata")
    save(testdf, file = savepath)
    
    # Delete it
    rm(testdf)
    
    # Load without assigning - and it's back in your environment
    load(savepath)
    testdf
    
    # But if you assign the results of load, you just get the name of the object
    wrong <- load(savepath)
    wrong
    
    
    # Compare with the RDS:
    rds_path <- file.path(savedir, "testdf.rds")
    saveRDS(testdf, file = rds_path)
    rm(testdf)
    testdf <- readRDS(file = rds_path)
    testdf
    

    Why the two different approaches? The save()-environment approach is good for creating a checkpoint of your entire environment that you can restore later - that's what R uses it for - but that's about it. It's too easy for such an environment to get cluttered, and if an object you load() has the same name as an object in your current environment, it will overwrite that object:

    testdf$z <- "blah"
    load(savepath)
    testdf  # testdf$z is gone
    

    The RDS method lets you assign the name on read, as you're looking to do here. It's a little more annoying to save multiple objects, sure, but you probably shouldn't be saving objects very often anyway - recreating objects from scratch is the best way to ensure that your R code does what you think it does.

    0 讨论(0)
  • 2021-01-24 08:03

    If you read ?help, it says that the return value of load is:

    A character vector of the names of objects created, invisibly.

    This suggests (but admittedly does not state) that the true work of the load command is by side-effect, in that it inserts the objects into an environment (defaulting to the current environment, often but not always .GlobalEnv). You should immediately have access to them from where you called load(...).

    For instance, if I can guess at variables you might have in your rda file:

    x
    # Error: object 'x' not found
    
    # either one of these on windows, NOT BOTH
    dat = load("C:\\Users\\user\\AppData\\Local\\Temp\\1_29_923-Macdonell.RData")
    dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData")
    
    dat
    # [1] "x" "y" "z"
    x
    # [1] 42
    

    If you want them to be not stored in the current environment, you can set up an environment to place them in. (I use parent=emptyenv(), but that's not strictly required. There are some minor ramifications to not including that option, none of them earth-shattering.)

    myenv <- new.env(parent = emptyenv())
    dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData",
               envir = myenv)
    dat
    # [1] "x" "y" "z"
    x
    # Error: object 'x' not found
    ls(envir = myenv)
    # [1] "x" "y" "z"
    

    From here you can get at your data in any number of ways:

    ls.str(myenv) # similar in concept to str() but for environments
    # x :  num 42
    # y :  num 1
    # z :  num 2
    myenv$x
    # [1] 42
    get("x", envir = myenv)
    # [1] 42
    

    Side note:

    You may have noticed that I used dat as my variable name instead of data. Though you are certainly allowed to use that, it can bite you if you use variable names that match existing variables or functions. For instance, all of your code will work just fine as long as you load your data. If, however, you run some of your code without pre-loading your objects into your data variable, you'll likely get an error such as:

    mean(data$x)
    # Error in data$x : object of type 'closure' is not subsettable
    

    That error message is not immediately self-evident. The problem is that if not previously defined as in your question, then data here refers to the function data. In programming terms, a closure is a special type of function, so the error really should have said:

    # Error in data$x : object of type 'function' is not subsettable
    

    meaning that though dat can be subsetted and dat$x means something, you cannot use the $ subset method on a function itself. (You can't do mean$x when referring to the mean function, for example.) Regardless, even though this here-modified error message is less confusing, it is still not clearly telling you what/where the problem is located.

    Because of this, many seasoned programmers will suggest you use unique variable names (perhaps more than just x :-). If you use my suggestion and name it dat instead, then the mistake of not preloading your data will instead error with:

    mean(dat$x)
    # Error in mean(dat$x) : object 'dat' not found
    

    which is a lot more meaningful and easier to troubleshoot.

    0 讨论(0)
提交回复
热议问题