I have been using the RSiteCatalyst package for a while right now. For those who do not know it, it makes the process of obtaining data from Adobe Analytics
I use a couple of functions to generate/retrieve the report IDs independently. This way, it doesn't matter how long it takes the reports to be processed. I usually come back for them 12 hours after the report IDs were generated. I think they expire after 48 hours or so. These functions rely on RSiteCatalyst of course. Here are the functions:
#' Generate report IDs to be retrieved later
#'
#' @description This function works in tandem with other functions to programatically extract big datasets from Adobe Analytics.
#' @param suite Report suite ID.
#' @param dateBegin Start date in the following format: YYYY-MM-DD.
#' @param dateFinish End date in the following format: YYYY-MM-DD.
#' @param metrics Vector containing up to 30 required metrics IDs.
#' @param elements Vector containing element IDs.
#' @param classification Vector containing classification IDs.
#'@param valueStart Integer value pointing to row to start report with.
#' @return A data frame containing all the report IDs per day. They are required to obtain all trended reports during the specified time frame.
#' @examples
#' \dontrun{
#' ReportsIDs <- reportsGenerator(suite,dateBegin,dateFinish,metrics, elements,classification)
#'}
#' @export
reportsGenerator <- function(suite,
dateBegin,
dateFinish,
metrics,
elements,
classification,
valueStart) {
#Convert dates to date format.
#Deduct one from dateBegin to
#neutralize the initial +1 in the loop.
dateBegin <- as.Date(dateBegin, "%Y-%m-%d") - 1
dateFinish <- as.Date(dateFinish, "%Y-%m-%d")
timeRange <- dateFinish - dateBegin
#Create data frame to store dates and report IDs
VisitorActivityReports <-
data.frame(matrix(NA, nrow = timeRange, ncol = 2))
names(VisitorActivityReports) <- c("Date", "ReportID")
#Run a loop to retrieve one ReportID for each day in the time period.
for (i in 1:timeRange) {
dailyDate <- as.character(dateBegin + i)
print(i) #Visibility to end user
print(dailyDate) #Visibility to end user
VisitorActivityReports[i, 1] <- dailyDate
VisitorActivityReports[i, 2] <-
RSiteCatalyst::QueueTrended(
reportsuite.id = suite,
date.from = dailyDate,
date.to = dailyDate,
metrics = metrics,
elements = elements,
classification = classification,
top = 50000,
max.attempts = 500,
start = valueStart,
enqueueOnly = T
)
}
return(VisitorActivityReports)
}
You should assign the output of the previous function to a variable. Then use that variable as the input of the following function. Also assign the result of reportsRetriever to a variable. The output will be a dataframe. The function will rbind all the reports together as long as they all share the same structure. Don't try to concatenate reports with different structure.
#' Retrieve all reports stored as output of reportsGenerator function and consolidate them.
#'
#' @param dataFrameReports This is the output from reportsGenerator function. It MUST contain a column titled: ReportID
#' @details It is recommended to break the input data frame in chunks of 50 rows in order to prevent memory issues if the reports are too large. Otherwise the server or local computer might run out of memory.
#' @return A data frame containing all the consolidated reports defined by the reportsGenerator function.
#' @examples
#' \dontrun{
#' visitorActivity <- reportsRetriever(dataFrameReports)
#'}
#'
#' @export
reportsRetriever <- function(dataFrameReports) {
visitor.activity.list <- lapply(dataFrameReports$ReportID, tryCatch(GetReport))
visitor.activity.df <- as.data.frame(do.call(rbind, visitor.activity.list))
#Validate report integrity
if (identical(as.character(unique(visitor.activity.df$datetime)), dataFrameReports$Date)) {
print("Ok. All reports available")
return(visitor.activity.df)
} else {
print("Some reports may have been missed.")
missingReportsIndex <- !(as.character(unique(visitor.activity.df$datetime)) %in% dataFrameReports$Date)
return(visitor.activity.df)
}
}