Combine geom_tile() and facet_grid/facet_wrap and remove space between tiles (ggplot2)

后端 未结 1 442
余生分开走
余生分开走 2021-01-22 14:16

I have a data set with x, y, and z (resp) values along with two columns for facetting in order to create a grid of tile plots.

The output of dput()

相关标签:
1条回答
  • 2021-01-22 14:56

    I ran into two issues:

    1) My true data set is generated via fit <- lm(x ~ y), followed by creating a new input data set via expand.grid, and then creating my plot data with predict(fit, newdata). Oddly, if I don't round the x and y points, I can't generate a plot (full code for creating my plot data is below).

    2) When plotting one geom_tile plot, the height/width of the tiles are set to fill the space. When creating a facetted set of tile plots, it seems that the height and width arguments for geom_tile() default to something else. I'd say they default to the smallest range, but no plot actually fills the full facet window width or height... so it's not that.

    Apologies for the example above not being reproducible. I didn't expect that dput() wouldn't create an identical data set and it was much simpler to just paste the output of that than have someone run all the code below.


    Full code for generating the data set:

    set.seed(42)
    x1 <- rnorm(20)
    x2 <- runif(20)
    x3 <- rpois(20,10)
    x4 <- rexp(20)
    y <- 10 + 2*x1 + 3*x2^2 + 4*x3 +5*x4 + rnorm(20, sd=0.1)
    
    dat <- data.frame(x1, x2, x3, x4, y)
    
    # vector of variable names for easy data.frame column naming later
    var_names <- c("x1", "x2", "x3", "x4")
    
    # possible combinations of four variables
    combis <- combn(1:4, 2)
    combis <- rbind(combis, combis[, 6:1])
    
    # steps to generate for the contour using `rep(min, max, length.out = n)`
    n <- 100
    
    #fit the model
    fit <- lm(y~x1+I(x2^2)+x3+x4, data=dat)
    
    # range values for when variable is featured on an axis
    mins_maxs <- data.frame(x1 = c(-3, 3),
                            x2 = c(0, 1),
                            x3 = c(5, 20),
                            x4 = c(0, 7))
    
    # values to hold for plots in which variables are not on an axis
    holds <- c(0, 0.5, 10, 5)
    
    sim_data <- list()
    sim_data <- lapply(seq_len(ncol(combis)), function (i) {
      sim_data[[i]] <- expand.grid(seq(mins_maxs[1, combis[1, i]], 
                                       mins_maxs[2, combis[1, i]], length.out = n),
                                   seq(mins_maxs[1, combis[2, i]], 
                                       mins_maxs[2, combis[2, i]], length.out = n),
                                   holds[combis[3, i]],
                                   holds[combis[4, i]])
    } )
    
    # we create a new data frame for the data set we want to plot
    # for each contour of interest, we want the first two columns, as the second two
    # were only created for the sake of feeding into our fit lm 
    # we'll also create a vector for facetting using paste()
    plot_data <- list()
    plot_data <- lapply(1:ncol(combis), function(i) {
      plot_data[[i]] <- sim_data[[i]][, 1:2]
      plot_data[[i]]$var1 <- rep(var_names[combis[1, i]],
                                 nrow(plot_data[[i]]))
      plot_data[[i]]$var2 <- rep(var_names[combis[2, i]],
                                       nrow(plot_data[[i]]))
      return(plot_data[[i]])
    } )
    
    # now we rename the columns of plot_data
    plot_data <- lapply(1:length(plot_data), function(i) {
      names(plot_data[[i]]) <- c("x", "y", "var1", "var2")
      return(plot_data[[i]])
    } )
    
    # similarly, we need to re-name the sim_data columns so we can rbind them
    # and predict new values using our fit lm
    sim_data <- lapply(1:length(sim_data), function(i) {
      names(sim_data[[i]]) <- var_names[combis[, i]]
      return(sim_data[[i]])
    } )
    
    
    # collapse the separate lists into one data.frame
    plot_data <- do.call(rbind, plot_data)
    sim_data <- do.call(rbind, sim_data)
    
    # create a vector of predictions using sim_data
    plot_data$resp <- predict(fit, sim_data)
    

    At this point, if one attempts to plot using plot_data, we get this:

    library(ggplot2)
    p <- ggplot(plot_data, aes(x = x, y = y, z = resp))
    p <- p + geom_tile(aes(fill = resp))
    p <- p + facet_wrap(var2 ~ var1, scales = "free", ncol = 3) + theme_bw()
    

    data as-is

    If we round the data, we get something else:

    plot_data_round <- plot_data
    plot_data_round[, c("x", "y")] <- round(plot_data_round[, c("x", "y")], 5)
    
    # same plot call as above
    p <- ggplot(plot_data_round, aes(x = x, y = y, z = resp))
    p <- p + geom_tile(aes(fill = resp))
    p <- p + facet_wrap(var2 ~ var1, scales = "free", ncol = 3) + theme_bw()
    

    data with rounding

    If we take the un-rounded data and add vectors to store height and width, we're back to a blank facetted plot like the un-rounded data alone.

    plot_data_hw <- plot_data
    hw <- ddply(plot_data, .(var1, var2), summarize,
                height = (max(y) - min(y))/n, width = (max(x) - min(x))/n)
    plot_data_hw <- merge(plot_data_hw, hw, by = c("var1", "var2"), all.x = T)
    # verify using same plot call and plot_data_hw
    

    If we combine rounding and height and width values per facet interaction, we get the desired result (adding height/width arguments to geom_tile()):

    plot_data_hw_round <- merge(plot_data_round, hw, by = c("var1", "var2"), all.x = T)
    p <- ggplot(plot_data_hw_round, aes(x = x, y = y, z = resp))
    p <- p + geom_tile(aes(fill = resp, height = height, width = width))
    p <- p + facet_wrap(var2 ~ var1, scales = "free", ncol = 3) + theme_bw()
    

    facetted tile map as desired

    So, there it is. I'm going to file a bug report about the rounding issue, as that seems quite peculiar to me. Perhaps there's still something going on other than that, but if it is a but due to floating point precision or whatnot, Hadley may still be interested.

    0 讨论(0)
提交回复
热议问题