Pivot Table-like Output in R?

為{幸葍}努か 提交于 2021-02-05 17:20:41

问题


I am writing a report that requires the generation of a number of pivot tables in Excel. I would like to think there is a way to do this in R so that I can avoid Excel. I would like output like the screenshot below (teacher names redacted). As far as I can tell, I could use the reshape package to calculate the aggregate values, but I'd need to do that a number of times and somehow get all of the data in the correct order. At that point, I should just be doing it in Excel. Does anyone have any suggestions or package recommendations? Thank you!

(EDIT) The data starts as a list of students, their teacher, school, and growth. This data is then aggregated to get a list of teachers with their average class growth. Please note the teachers are then grouped by school. The largest problem I foresee doing this with R as of now is how do you get the subtotal and total rows (BSA1 Total, Grand Total, etc) in there since they are not the same type of observation as the others? Do you just manually have to calculate them and try to get them in the correct order so they appear at the bottom of that group?


(source: imgh.us)


回答1:


Here's a swag at the calculation bits:

set.seed(1)
school  <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
growth <- rnorm(100, 5, 3)

myDf <- data.frame(school, teacher, growth)

require(reshape2)

aggregate(growth ~ school + teacher, data =myDf, FUN=mean)

myDf.melt <- melt(myDf, measured="growth")
dcast(myDf.melt, school + teacher ~ ., fun.aggregate=mean, margins=c("school", "teacher"))

I've not addressed output formatting, only calculation. The resulting data frame should look like this:

   school teacher       NA
1    BSA1    Dick 4.663140
2    BSA1   Harry 4.310802
3    BSA1     Tom 5.505247
4    BSA1   (all) 4.670451
5    BSA2    Dick 6.110988
6    BSA2   Harry 5.007221
7    BSA2     Tom 4.337063
8    BSA2   (all) 5.196018
9    HSA1    Dick 4.508610
10   HSA1   Harry 4.890741
11   HSA1     Tom 4.721124
12   HSA1   (all) 4.717335
13  (all)   (all) 4.886576

That example uses the reshape2 package to handle the subtotals.

I think R is the right tool for the job here. I can totally understand not being sure how to get started on this analysis. I came to R from Excel a few years ago and it can be tough to grok at first. Let me point out four pro tips to help you get better answers in Stack Overflow:

1) provide data, even if simulated: you can see I simulated some data at the beginning of my answer. If you had provided that simulation it would have a) saved me time b) gotten you an answer that used your own data structure, not one I dreamed up and c) other people would have answered. I often skip questions with no data because I've grown tired of guessing about the data them being told my answer sucked because I guessed wrong.

2) Ask one clear question. "How do I do my work" is not a single clear question. "How do I take this example data and create subtotals in the aggregation like this example output" is a single specific question.

3) keep asking! We all get better with practice. You're trying to do more in R and less in Excel so you're clearly of above average intelligence. Keep using R and keep asking questions. It will all get easier in time.

4) Be careful with your words when you describe things. You say in your edited question you have a "list" of things. A list in R is a specific data structure. I'm suspicious you actually have a data frame and are using the term "list" in a generic sense. This can make for some confusion. It also illustrates why you want to provide your own data.




回答2:


Using JD Long's simulated data, and adding the sd and counts:

   library(reshape)  # not reshape2
   cast(myDf.melt, school + teacher ~ ., margins=TRUE , c(mean, sd, length))
   school teacher     mean       sd length
1    BSA1    Dick 4.663140 3.718773     14
2    BSA1   Harry 4.310802 1.430594      9
3    BSA1     Tom 5.505247 4.045846      4
4    BSA1   (all) 4.670451 3.095980     27
5    BSA2    Dick 6.110988 2.304104     15
6    BSA2   Harry 5.007221 2.908146      9
7    BSA2     Tom 4.337063 2.789244     14
8    BSA2   (all) 5.196018 2.682924     38
9    HSA1    Dick 4.508610 2.946961     11
10   HSA1   Harry 4.890741 2.977305     13
11   HSA1     Tom 4.721124 3.193576     11
12   HSA1   (all) 4.717335 2.950959     35
13  (all)   (all) 4.886576 2.873637    100



回答3:


Below are several different ways of generating this using the relatively new pivottabler package.

Disclosure: I'm the package author.

For more information see the package page on CRAN and the various package vignettes available on that page.

Sample Data (same as above)

set.seed(1)
school  <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
growth <- rnorm(100, 5, 3)
myDf <- data.frame(school, teacher, growth)

Quick pivot table output to console as plain text

library(pivottabler)
# arguments:  qhpvt(dataFrame, rows, columns, calculations, ...)
qpvt(myDf, c("school", "teacher"), NULL,  
     c("Average Growth"="mean(growth)", "Std Dev"="sd(growth)",
       "# of Scholars"="n()"),
     formats=list("%.1f", "%.1f", "%.0f"))

Console Output:

              Average Growth  Std Dev  # of Scholars  
BSA1   Dick              4.7      3.7             14  
       Harry             4.3      1.4              9  
       Tom               5.5      4.0              4  
       Total             4.7      3.1             27  
BSA2   Dick              6.1      2.3             15  
       Harry             5.0      2.9              9  
       Tom               4.3      2.8             14  
       Total             5.2      2.7             38  
HSA1   Dick              4.5      2.9             11  
       Harry             4.9      3.0             13  
       Tom               4.7      3.2             11  
       Total             4.7      3.0             35  
Total                    4.9      2.9            100  

Quick pivot table output as a html widget

library(pivottabler)
qhpvt(myDf, c("school", "teacher"), NULL,  
     c("Average Growth"="mean(growth)", "Std Dev"="sd(growth)",
       "# of Scholars"="n()"),
     formats=list("%.1f", "%.1f", "%.0f"))

HTML Widget Output:

Generating pivot table using more verbose syntax

This has more options, e.g. renaming totals.

library(pivottabler)
pt <- PivotTable$new()
pt$addData(myDf)
pt$addRowDataGroups("school", totalCaption="(all)")
pt$addRowDataGroups("teacher", totalCaption="(all)")
pt$defineCalculation(calculationName="c1", caption="Average Growth", 
   summariseExpression="mean(growth)", format="%.1f")
pt$defineCalculation(calculationName="c2", caption="Std Dev", 
   summariseExpression="sd(growth)", format="%.1f")
pt$defineCalculation(calculationName="c3", caption="# of Scholars", 
   summariseExpression="n()", format="%.0f")
pt  # to output to console as plain text
pt$renderPivot() # to output as a html widget

HTML Widget Output:




回答4:


Sorry for autopromotion but take a look at my package expss.

Code for generating output below:

set.seed(1)
school  <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
growth <- rnorm(100, 5, 3)

myDf <- data.frame(school, teacher, growth)

library(expss)
myDf %>% 
    # 'tab_cells' - variables on which statistics will be calculated
    # "|" is needed to suppress 'growth' in row labels
    tab_cells("|" = growth) %>%  
    # 'tab_cols' - variables for columns. Can be ommited
    tab_cols(total(label = "")) %>% 
    # 'tab_rows' - variables for rows.
    tab_rows(school %nest% list(teacher, "(All)"), "|" = "(All)") %>% 
    # 'method = list' is needed for statistics labels in column
    tab_stat_fun("Average Growth" = mean, 
                 "Std Dev" = sd, 
                 "# of scholars" = length, 
                 method = list) %>% 
    # finalize table
    tab_pivot()

Code above gives object inherited from data.frame which can be used with standard R operations (subsetting with [ and etc.). But there is a special print method for this object. Console output:

 |       |       | Average Growth | Std Dev | # of scholars |
 | ----- | ----- | -------------- | ------- | ------------- |
 |  BSA1 |  Dick |            4.7 |     3.7 |            14 |
 |       | Harry |            4.3 |     1.4 |             9 |
 |       |   Tom |            5.5 |     4.0 |             4 |
 |       | (All) |            4.7 |     3.1 |            27 |
 |  BSA2 |  Dick |            6.1 |     2.3 |            15 |
 |       | Harry |            5.0 |     2.9 |             9 |
 |       |   Tom |            4.3 |     2.8 |            14 |
 |       | (All) |            5.2 |     2.7 |            38 |
 |  HSA1 |  Dick |            4.5 |     2.9 |            11 |
 |       | Harry |            4.9 |     3.0 |            13 |
 |       |   Tom |            4.7 |     3.2 |            11 |
 |       | (All) |            4.7 |     3.0 |            35 |
 | (All) |       |            4.9 |     2.9 |           100 |

Output via htmlTable in knitr, RStudio viewer or Shiny:



来源:https://stackoverflow.com/questions/6667091/pivot-table-like-output-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!