Pivot Table-like Output in R?

后端 未结 4 402
慢半拍i
慢半拍i 2021-01-31 22:16

I am writing a report that requires the generation of a number of pivot tables in Excel. I would like to think there is a way to do this in R so that I can avoid Excel. I would

相关标签:
4条回答
  • 2021-01-31 22:38

    Using JD Long's simulated data, and adding the sd and counts:

       library(reshape)  # not reshape2
       cast(myDf.melt, school + teacher ~ ., margins=TRUE , c(mean, sd, length))
       school teacher     mean       sd length
    1    BSA1    Dick 4.663140 3.718773     14
    2    BSA1   Harry 4.310802 1.430594      9
    3    BSA1     Tom 5.505247 4.045846      4
    4    BSA1   (all) 4.670451 3.095980     27
    5    BSA2    Dick 6.110988 2.304104     15
    6    BSA2   Harry 5.007221 2.908146      9
    7    BSA2     Tom 4.337063 2.789244     14
    8    BSA2   (all) 5.196018 2.682924     38
    9    HSA1    Dick 4.508610 2.946961     11
    10   HSA1   Harry 4.890741 2.977305     13
    11   HSA1     Tom 4.721124 3.193576     11
    12   HSA1   (all) 4.717335 2.950959     35
    13  (all)   (all) 4.886576 2.873637    100
    
    0 讨论(0)
  • 2021-01-31 22:43

    Below are several different ways of generating this using the relatively new pivottabler package.

    Disclosure: I'm the package author.

    For more information see the package page on CRAN and the various package vignettes available on that page.

    Sample Data (same as above)

    set.seed(1)
    school  <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
    teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
    growth <- rnorm(100, 5, 3)
    myDf <- data.frame(school, teacher, growth)
    

    Quick pivot table output to console as plain text

    library(pivottabler)
    # arguments:  qhpvt(dataFrame, rows, columns, calculations, ...)
    qpvt(myDf, c("school", "teacher"), NULL,  
         c("Average Growth"="mean(growth)", "Std Dev"="sd(growth)",
           "# of Scholars"="n()"),
         formats=list("%.1f", "%.1f", "%.0f"))
    

    Console Output:

                  Average Growth  Std Dev  # of Scholars  
    BSA1   Dick              4.7      3.7             14  
           Harry             4.3      1.4              9  
           Tom               5.5      4.0              4  
           Total             4.7      3.1             27  
    BSA2   Dick              6.1      2.3             15  
           Harry             5.0      2.9              9  
           Tom               4.3      2.8             14  
           Total             5.2      2.7             38  
    HSA1   Dick              4.5      2.9             11  
           Harry             4.9      3.0             13  
           Tom               4.7      3.2             11  
           Total             4.7      3.0             35  
    Total                    4.9      2.9            100  
    

    Quick pivot table output as a html widget

    library(pivottabler)
    qhpvt(myDf, c("school", "teacher"), NULL,  
         c("Average Growth"="mean(growth)", "Std Dev"="sd(growth)",
           "# of Scholars"="n()"),
         formats=list("%.1f", "%.1f", "%.0f"))
    

    HTML Widget Output:

    Generating pivot table using more verbose syntax

    This has more options, e.g. renaming totals.

    library(pivottabler)
    pt <- PivotTable$new()
    pt$addData(myDf)
    pt$addRowDataGroups("school", totalCaption="(all)")
    pt$addRowDataGroups("teacher", totalCaption="(all)")
    pt$defineCalculation(calculationName="c1", caption="Average Growth", 
       summariseExpression="mean(growth)", format="%.1f")
    pt$defineCalculation(calculationName="c2", caption="Std Dev", 
       summariseExpression="sd(growth)", format="%.1f")
    pt$defineCalculation(calculationName="c3", caption="# of Scholars", 
       summariseExpression="n()", format="%.0f")
    pt  # to output to console as plain text
    pt$renderPivot() # to output as a html widget
    

    HTML Widget Output:

    0 讨论(0)
  • 2021-01-31 22:48

    Here's a swag at the calculation bits:

    set.seed(1)
    school  <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
    teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
    growth <- rnorm(100, 5, 3)
    
    myDf <- data.frame(school, teacher, growth)
    
    require(reshape2)
    
    aggregate(growth ~ school + teacher, data =myDf, FUN=mean)
    
    myDf.melt <- melt(myDf, measured="growth")
    dcast(myDf.melt, school + teacher ~ ., fun.aggregate=mean, margins=c("school", "teacher"))
    

    I've not addressed output formatting, only calculation. The resulting data frame should look like this:

       school teacher       NA
    1    BSA1    Dick 4.663140
    2    BSA1   Harry 4.310802
    3    BSA1     Tom 5.505247
    4    BSA1   (all) 4.670451
    5    BSA2    Dick 6.110988
    6    BSA2   Harry 5.007221
    7    BSA2     Tom 4.337063
    8    BSA2   (all) 5.196018
    9    HSA1    Dick 4.508610
    10   HSA1   Harry 4.890741
    11   HSA1     Tom 4.721124
    12   HSA1   (all) 4.717335
    13  (all)   (all) 4.886576
    

    That example uses the reshape2 package to handle the subtotals.

    I think R is the right tool for the job here. I can totally understand not being sure how to get started on this analysis. I came to R from Excel a few years ago and it can be tough to grok at first. Let me point out four pro tips to help you get better answers in Stack Overflow:

    1) provide data, even if simulated: you can see I simulated some data at the beginning of my answer. If you had provided that simulation it would have a) saved me time b) gotten you an answer that used your own data structure, not one I dreamed up and c) other people would have answered. I often skip questions with no data because I've grown tired of guessing about the data them being told my answer sucked because I guessed wrong.

    2) Ask one clear question. "How do I do my work" is not a single clear question. "How do I take this example data and create subtotals in the aggregation like this example output" is a single specific question.

    3) keep asking! We all get better with practice. You're trying to do more in R and less in Excel so you're clearly of above average intelligence. Keep using R and keep asking questions. It will all get easier in time.

    4) Be careful with your words when you describe things. You say in your edited question you have a "list" of things. A list in R is a specific data structure. I'm suspicious you actually have a data frame and are using the term "list" in a generic sense. This can make for some confusion. It also illustrates why you want to provide your own data.

    0 讨论(0)
  • 2021-01-31 22:57

    Sorry for autopromotion but take a look at my package expss.

    Code for generating output below:

    set.seed(1)
    school  <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
    teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
    growth <- rnorm(100, 5, 3)
    
    myDf <- data.frame(school, teacher, growth)
    
    library(expss)
    myDf %>% 
        # 'tab_cells' - variables on which statistics will be calculated
        # "|" is needed to suppress 'growth' in row labels
        tab_cells("|" = growth) %>%  
        # 'tab_cols' - variables for columns. Can be ommited
        tab_cols(total(label = "")) %>% 
        # 'tab_rows' - variables for rows.
        tab_rows(school %nest% list(teacher, "(All)"), "|" = "(All)") %>% 
        # 'method = list' is needed for statistics labels in column
        tab_stat_fun("Average Growth" = mean, 
                     "Std Dev" = sd, 
                     "# of scholars" = length, 
                     method = list) %>% 
        # finalize table
        tab_pivot()
    

    Code above gives object inherited from data.frame which can be used with standard R operations (subsetting with [ and etc.). But there is a special print method for this object. Console output:

     |       |       | Average Growth | Std Dev | # of scholars |
     | ----- | ----- | -------------- | ------- | ------------- |
     |  BSA1 |  Dick |            4.7 |     3.7 |            14 |
     |       | Harry |            4.3 |     1.4 |             9 |
     |       |   Tom |            5.5 |     4.0 |             4 |
     |       | (All) |            4.7 |     3.1 |            27 |
     |  BSA2 |  Dick |            6.1 |     2.3 |            15 |
     |       | Harry |            5.0 |     2.9 |             9 |
     |       |   Tom |            4.3 |     2.8 |            14 |
     |       | (All) |            5.2 |     2.7 |            38 |
     |  HSA1 |  Dick |            4.5 |     2.9 |            11 |
     |       | Harry |            4.9 |     3.0 |            13 |
     |       |   Tom |            4.7 |     3.2 |            11 |
     |       | (All) |            4.7 |     3.0 |            35 |
     | (All) |       |            4.9 |     2.9 |           100 |
    

    Output via htmlTable in knitr, RStudio viewer or Shiny:

    0 讨论(0)
提交回复
热议问题