Examining contents of .rdata file by attaching into a new environment - possible?

后端 未结 4 612
再見小時候
再見小時候 2021-01-31 22:48

I am interested in listing objects in an RDATA file and loading only selected objects, rather than the whole set (in case some may be big or may already exist in the environment

相关标签:
4条回答
  • 2021-01-31 23:00

    Since this question has just been referenced let's clarify two things:

    1. attach() simply calls load() so there is really no point in using it instead of load

    2. if you want selective access to prevent masking it's much easier to simply load the file into a new environment:

      e = local({load("foo.RData"); environment()})
      

      You can then use ls(e) and access contents like e$x. You can still use attach on the environment if you really want it on the search path.

    FWIW .RData files have no index (the objects are stored in one big pairlist), so you can't list the contained objects without loading. If you want convenient access, convert it to the lazy-load format instead which simply adds an index so each object can be loaded separately (see Get specific object from Rdata file)

    0 讨论(0)
  • 2021-01-31 23:11

    I just use an env= argument to load():

    > x <- 1; y <- 2; z <- "foo"
    > save(x, y, z, file="/tmp/foo.RData")
    > ne <- new.env()
    > load(file="/tmp/foo.RData", env=ne)
    > ls(env=ne)
    [1] "x" "y" "z"
    > ne$z
    [1] "foo"
    > 
    

    The cost of this approach is that you do read the whole RData file---but on the other hand that seems to be unavoidable anyway as no other method seems to offer a list of the 'content' of such a file.

    0 讨论(0)
  • 2021-01-31 23:24

    Thanks to @Dirk and @Joshua.

    I had an epiphany. The command/package foreach with SMP or MC seems to produce environments that only inherit, but do not seem to conflict with, the global environment.

    lsfile   <- function(list_files){
        aggregate_ls = foreach(ix = 1:length(list_files)) %dopar% {
          attach(list_files[ix])
          tmpls <- ls(pos = 2)
          return(tmpls)
        }
      return(aggregate_ls)
    }
    
    lsfile("f1.rdat")
    lsfile(dir(pattern = "*rdat"))
    

    This is useful to me because I can now parallelize this. This is a bare-bones version, and I will modify it to give more detailed information, but so far it seems to be the only way to avoid conflicts, even without ignore.

    So, question #1 can be resolved by either ignoring the warnings (as @Joshua suggested) or by using whatever magic foreach summons.

    For part 2, loading an object, I think @Joshua has the right idea - "get" will do.

    The foreach magic can also work, by using the .noexport option. However, this has risks: whatever isn't specifically excluded will be inherited/exported from the global environment (I could do ls(), but there's always the possibility of attached datasets). For safety, this means that get() must still be used to avoid the risk of a naming conflict. Loading into a subenvironment avoids the naming conflict, but doesn't avoid the loading of unnecessary objects.

    @Joshua's answer is far simpler than my foreach detour.

    0 讨论(0)
  • 2021-01-31 23:25

    You can suppress the warning by setting warn.conflicts=FALSE on the call to attach. If an object is masked by one in the global environment, you can use get to retreive it from your attached data.

    x <- 1:10
    save(x, file="x.rData")
    #attach("x.rData", pos=2, warn.conflicts=FALSE)
    attach("x.rData", pos=2)
    (x <- 1)
    # [1] 1
    (x <- get("x", pos=2))
    # [1]  1  2  3  4  5  6  7  8  9 10
    
    0 讨论(0)
提交回复
热议问题