Here's the situation. My R
code is supposed to check whether existing RData
files in application's cache are up-to-date. I do that by saving the files with names consisting of base64
-encoded names of a specific data element. However, data corresponding to each of these elements are being retrieved by submitting a particular SQL query per element, all specified in data collection's configuration file. So, in a situation when data for an element is retrieved, but afterwards I had to change that particular SQL query, data is not being updated.
In order to handle this situation, I decided to use R
objects' attributes. I plan to save each data object's corresponding SQL query (request
) - base64
-encoded - as the object's attribute:
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
Then, when I need to verify whether the SQL has been query changed, I'd like to simply retrieve the object's corresponding attribute and compare it with the hash of the current SQL query. If they match - the query hasn't been changed and I skip processing this data request, if they don't match - the query has been changed and I go ahead with processing the request:
# check if the archive file has already been processed
if (DEBUG) {message("Processing request \"", request, "\" ...")}
if (file.exists(rdataFile)) {
# now check if request's SQL query hasn't been modified
data <- load(rdataFile)
if (identical(base64(request), attr(data, "SQL"))) {
skipped <<- skipped + 1
if (DEBUG) {message("Processing skipped: .Rdata file found.\n")}
return (invisible())
}
rm(data)
}
My question is whether it's possible to read/access object's attributes without fully loading the object from file. In other words, can I avoid the load()
and rm()
in the code above?
Your advice is much appreciated!
UPDATE: Additional question: What's wrong with my code, as it performs processing even when it shouldn't - in case, when all information is up-to-date (no changes in cache and in configuration file as well)?
UPDATE 2 (additional code per @MrFlick's answer):
# construct name from data source prefix and data ID (see config. file),
# so that corresponding data object (usually, data frame) will be saved
# later under that name via save()
dataName <- paste(dsPrefix, "data", indicator, sep = ".")
assign(dataName, srdaGetData())
data <- as.name(dataName)
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
# save current data frame to RData file
save(list = dataName, file = rdataFile)
# alternatively, use do.call() as in "getFLOSSmoleDataXML.R"
# clean up
rm(data)
You can't "really" do it, but you could modify the code in my cgwtools::lsdata
function.
function (fnam = ".Rdata")
{
x <- load(fnam, envir = environment())
return(x)
}
This loads, thus taking time and briefly taking memory, and then the local environment disappears. So, add an argument for the items you want to check attributes for, add a line inside the function which does attributes(your_items) ->y ; return (list(x=x,y=y))
And there is a problem with the way you are using load()
. When you use save
/load
you can "freeze-dry" multiple objects to an .RData file. They "re-infalte" into the current environemnt. As a result, when you call load()
, it does not return the object(s), it returns a character vector with the names of all the objects that it restored. Since you didn't supply your save()
code, i'm not sure what's actually in your load file, but if it was a variable called data
, then just call
load(rdataFile)
not
data <- load(rdataFile)
来源:https://stackoverflow.com/questions/23701195/can-i-access-r-data-objects-attributes-without-fully-loading-objects-from-file