I have a data set with x, y, and z (resp
) values along with two columns for facetting in order to create a grid of tile plots.
The output of dput()
I ran into two issues:
1) My true data set is generated via fit <- lm(x ~ y)
, followed by creating a new input data set via expand.grid
, and then creating my plot data with predict(fit, newdata)
. Oddly, if I don't round the x
and y
points, I can't generate a plot (full code for creating my plot data is below).
2) When plotting one geom_tile
plot, the height/width of the tiles are set to fill the space. When creating a facetted set of tile plots, it seems that the height
and width
arguments for geom_tile()
default to something else. I'd say they default to the smallest range, but no plot actually fills the full facet window width or height... so it's not that.
Apologies for the example above not being reproducible. I didn't expect that dput()
wouldn't create an identical data set and it was much simpler to just paste the output of that than have someone run all the code below.
Full code for generating the data set:
set.seed(42)
x1 <- rnorm(20)
x2 <- runif(20)
x3 <- rpois(20,10)
x4 <- rexp(20)
y <- 10 + 2*x1 + 3*x2^2 + 4*x3 +5*x4 + rnorm(20, sd=0.1)
dat <- data.frame(x1, x2, x3, x4, y)
# vector of variable names for easy data.frame column naming later
var_names <- c("x1", "x2", "x3", "x4")
# possible combinations of four variables
combis <- combn(1:4, 2)
combis <- rbind(combis, combis[, 6:1])
# steps to generate for the contour using `rep(min, max, length.out = n)`
n <- 100
#fit the model
fit <- lm(y~x1+I(x2^2)+x3+x4, data=dat)
# range values for when variable is featured on an axis
mins_maxs <- data.frame(x1 = c(-3, 3),
x2 = c(0, 1),
x3 = c(5, 20),
x4 = c(0, 7))
# values to hold for plots in which variables are not on an axis
holds <- c(0, 0.5, 10, 5)
sim_data <- list()
sim_data <- lapply(seq_len(ncol(combis)), function (i) {
sim_data[[i]] <- expand.grid(seq(mins_maxs[1, combis[1, i]],
mins_maxs[2, combis[1, i]], length.out = n),
seq(mins_maxs[1, combis[2, i]],
mins_maxs[2, combis[2, i]], length.out = n),
holds[combis[3, i]],
holds[combis[4, i]])
} )
# we create a new data frame for the data set we want to plot
# for each contour of interest, we want the first two columns, as the second two
# were only created for the sake of feeding into our fit lm
# we'll also create a vector for facetting using paste()
plot_data <- list()
plot_data <- lapply(1:ncol(combis), function(i) {
plot_data[[i]] <- sim_data[[i]][, 1:2]
plot_data[[i]]$var1 <- rep(var_names[combis[1, i]],
nrow(plot_data[[i]]))
plot_data[[i]]$var2 <- rep(var_names[combis[2, i]],
nrow(plot_data[[i]]))
return(plot_data[[i]])
} )
# now we rename the columns of plot_data
plot_data <- lapply(1:length(plot_data), function(i) {
names(plot_data[[i]]) <- c("x", "y", "var1", "var2")
return(plot_data[[i]])
} )
# similarly, we need to re-name the sim_data columns so we can rbind them
# and predict new values using our fit lm
sim_data <- lapply(1:length(sim_data), function(i) {
names(sim_data[[i]]) <- var_names[combis[, i]]
return(sim_data[[i]])
} )
# collapse the separate lists into one data.frame
plot_data <- do.call(rbind, plot_data)
sim_data <- do.call(rbind, sim_data)
# create a vector of predictions using sim_data
plot_data$resp <- predict(fit, sim_data)
At this point, if one attempts to plot using plot_data
, we get this:
library(ggplot2)
p <- ggplot(plot_data, aes(x = x, y = y, z = resp))
p <- p + geom_tile(aes(fill = resp))
p <- p + facet_wrap(var2 ~ var1, scales = "free", ncol = 3) + theme_bw()
If we round the data, we get something else:
plot_data_round <- plot_data
plot_data_round[, c("x", "y")] <- round(plot_data_round[, c("x", "y")], 5)
# same plot call as above
p <- ggplot(plot_data_round, aes(x = x, y = y, z = resp))
p <- p + geom_tile(aes(fill = resp))
p <- p + facet_wrap(var2 ~ var1, scales = "free", ncol = 3) + theme_bw()
If we take the un-rounded data and add vectors to store height and width, we're back to a blank facetted plot like the un-rounded data alone.
plot_data_hw <- plot_data
hw <- ddply(plot_data, .(var1, var2), summarize,
height = (max(y) - min(y))/n, width = (max(x) - min(x))/n)
plot_data_hw <- merge(plot_data_hw, hw, by = c("var1", "var2"), all.x = T)
# verify using same plot call and plot_data_hw
If we combine rounding and height
and width
values per facet interaction, we get the desired result (adding height
/width
arguments to geom_tile()
):
plot_data_hw_round <- merge(plot_data_round, hw, by = c("var1", "var2"), all.x = T)
p <- ggplot(plot_data_hw_round, aes(x = x, y = y, z = resp))
p <- p + geom_tile(aes(fill = resp, height = height, width = width))
p <- p + facet_wrap(var2 ~ var1, scales = "free", ncol = 3) + theme_bw()
So, there it is. I'm going to file a bug report about the rounding issue, as that seems quite peculiar to me. Perhaps there's still something going on other than that, but if it is a but due to floating point precision or whatnot, Hadley may still be interested.