How subset a data frame by a factor and repeat a plot for each subset?

前端 未结 3 414
天涯浪人
天涯浪人 2020-11-27 04:53

I am new to R. Forgive me if this if this question has an obvious answer but I\'ve not been able to find a solution. I have experience with SAS and may just be thinking of t

相关标签:
3条回答
  • 2020-11-27 05:42

    Because you want to split up the dataset and make a plot for each level of a factor, I would approach this with one of the split-apply-return tools from the plyr package.

    Here is a toy example using the mtcars dataset. I first create the plot and name it p, then use dlply to split the dataset by a factor and return a plot for each level. I'm taking advantage of %+% from ggplot2 to replace the data.frame in a plot.

    p = ggplot(data = mtcars, aes(x = wt, y = mpg)) + 
        geom_line()
    
    require(plyr)
    dlply(mtcars, .(cyl), function(x) p %+% x)
    

    This returns all the plots, one after another. If you name the resulting list object you can also call one plot at a time.

    plots = dlply(mtcars, .(cyl), function(x) p %+% x)
    plots[1]
    

    Edit

    I started thinking about putting a title on each plot based on the factor, which seems like it would be useful.

    dlply(mtcars, .(cyl), function(x) p %+% x + facet_wrap(~cyl))
    

    Edit 2

    Here is one way to save these in a single document, one plot per page. This is working with the list of plots named plots. It saves them all to one document, one plot per page. I didn't change any of the defaults in pdf, but you can certainly explore the changes you can make.

    pdf()
    plots
    dev.off()
    

    Updated to use package dplyr instead of plyr. This is done in do, and the output will have a named column that contains all the plots as a list.

    library(dplyr)
    plots = mtcars %>%
        group_by(cyl) %>%
        do(plots = p %+% . + facet_wrap(~cyl))
    
    
    Source: local data frame [3 x 2]
    Groups: <by row>
    
      cyl           plots
    1   4 <S3:gg, ggplot>
    2   6 <S3:gg, ggplot>
    3   8 <S3:gg, ggplot>
    

    To see the plots in R, just ask for the column that contains the plots.

    plots$plots
    

    And to save as a pdf

    pdf()
    plots$plots
    dev.off()
    
    0 讨论(0)
  • 2020-11-27 05:42

    A few years ago, I wanted to do something similar - plot individual trajectories for ~2500 participants with 1-7 measurements each. I did it like this, using plyr and ggplot2:

    library(plyr)
    library(ggplot2)
    
    d_ply(dat, .var = "participant_id", .fun = function(x) {
    
        # Generate the desired plot
        ggplot(x, aes(x = phase, y = result)) +
            geom_point() +
            geom_line()
    
        # Save it to a file named after the participant
        # Putting it in a subdirectory is prudent
        ggsave(file.path("plots", paste0(x$participant_id, ".png")))
    
    })
    

    A little slow, but it worked. If you want to get a sense of all participants' trajectories in one plot (like your second example - aka the spaghetti plot), you can tweak the transparency of the lines (forget coloring them, though):

    ggplot(data = dat, aes(x = phase, y = result, group = participant_id)) + 
        geom_line(alpha = 0.3)
    
    0 讨论(0)
  • 2020-11-27 05:52
    lapply(temp, function(X) ggplot(X, ...))
    

    Where X is your subsetted data

    Keep in mind you may have to explicitly print the ggplot object (print(ggplot(X, ..)))

    0 讨论(0)
提交回复
热议问题