I want to produce a pdf which shows multiple graphs, one for each NetworkTrackingPixelId
.
I have a data frame similar to this:
> head(data)
Net
Unless I'm missing something, generating plots by a subsetting variable is very simple. You can use split(...)
to split the original data into a list of data frames by NetworkTrackingPixelId
, and then pass those to ggplot
using lapply(...)
. Most of the code below is just to crate a sample dataset.
# create sample data
set.seed(1)
names <- c("Rubicon","Google","OpenX","AppNexus","Pubmatic")
dates <- as.Date("2014-02-16")+1:10
df <- data.frame(NetworkTrackingPixelId=rep(1:5,each=10),
Name=sample(names,50,replace=T),
Date=dates,
Impressions=sample(1000:10000,50))
# end create sample data
pdf("plots.pdf")
lapply(split(df,df$NetworkTrackingPixelId),
function(gg) ggplot(gg,aes(x = Date, y = Impressions)) +
geom_point() + geom_line()+
ggtitle(paste("NetworkTrackingPixelId:",gg$NetworkTrackingPixelId)))
dev.off()
This generates a pdf containing 5 plots, one for each NetworkTrackingPixelId
.
I recently had a project that required producing a lot of individual pngs for each record. I found I got a huge speed up doing some pretty simple parallelization. I am not sure if this is more performant than the dplyr
or data.table
technique but it may be worth trying. I saw a huge speed bump:
require(foreach)
require(doParallel)
workers <- makeCluster(4)
registerDoParallel(workers)
foreach(i = seq(1, length(mtcars$gear)), .packages=c('ggplot2')) %dopar% {
j <- qplot(wt, mpg, data = mtcars[i,])
png(file=paste(getwd(), '/images/',mtcars[i, c('gear')],'.png', sep=''))
print(j)
dev.off()
}
I think you would be better off writing a function for plotting, then using lapply for every Network Tracking Pixel.
For example, your function might look like:
plot.function <- function(ntpid){
sub = subset(dataset, dataset$networktrackingpixelid == ntpid)
ggobj = ggplot(data=sub, aes(...)) + geom...
ggsave(filename=sprintf("%s.pdf", ntpid))
}
It would be helpful for you to put a reproducible example, but I hope this works! Not sure about the vector issue though..
Cheers!
Since I don't have your dataset, I will use the mtcars
dataset to illustrate how to do this using dplyr
and data.table
. Both packages are the finest examples of the split-apply-combine
paradigm in rstats. Let me explain:
Step 1 Split data by gear
dplyr
uses the function group_by
data.table
uses argument by
Step 2: Apply a function
dplyr
uses do
to which you can pass a function that uses the pieces x.data.table
interprets the variables to the function in context of each piece.Step 3: Combine
There is no combine step here, since we are saving the charts created to file.
library(dplyr)
mtcars %.%
group_by(gear) %.%
do(function(x){ggsave(
filename = sprintf("gear_%s.pdf", unique(x$gear)), qplot(wt, mpg, data = x)
)})
library(data.table)
mtcars_dt = data.table(mtcars)
mtcars_dt[,ggsave(
filename = sprintf("gear_%s.pdf", unique(gear)), qplot(wt, mpg)),
by = gear
]
UPDATE: To save all files into one pdf, here is a quick solution.
plots = mtcars %.%
group_by(gear) %.%
do(function(x) {
qplot(wt, mpg, data = x)
})
pdf('all.pdf')
invisible(lapply(plots, print))
dev.off()