问题
I am writing a report that requires the generation of a number of pivot tables in Excel. I would like to think there is a way to do this in R so that I can avoid Excel. I would like output like the screenshot below (teacher names redacted). As far as I can tell, I could use the reshape package to calculate the aggregate values, but I'd need to do that a number of times and somehow get all of the data in the correct order. At that point, I should just be doing it in Excel. Does anyone have any suggestions or package recommendations? Thank you!
(EDIT) The data starts as a list of students, their teacher, school, and growth. This data is then aggregated to get a list of teachers with their average class growth. Please note the teachers are then grouped by school. The largest problem I foresee doing this with R as of now is how do you get the subtotal and total rows (BSA1 Total, Grand Total, etc) in there since they are not the same type of observation as the others? Do you just manually have to calculate them and try to get them in the correct order so they appear at the bottom of that group?
(source: imgh.us)
回答1:
Here's a swag at the calculation bits:
set.seed(1)
school <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
growth <- rnorm(100, 5, 3)
myDf <- data.frame(school, teacher, growth)
require(reshape2)
aggregate(growth ~ school + teacher, data =myDf, FUN=mean)
myDf.melt <- melt(myDf, measured="growth")
dcast(myDf.melt, school + teacher ~ ., fun.aggregate=mean, margins=c("school", "teacher"))
I've not addressed output formatting, only calculation. The resulting data frame should look like this:
school teacher NA
1 BSA1 Dick 4.663140
2 BSA1 Harry 4.310802
3 BSA1 Tom 5.505247
4 BSA1 (all) 4.670451
5 BSA2 Dick 6.110988
6 BSA2 Harry 5.007221
7 BSA2 Tom 4.337063
8 BSA2 (all) 5.196018
9 HSA1 Dick 4.508610
10 HSA1 Harry 4.890741
11 HSA1 Tom 4.721124
12 HSA1 (all) 4.717335
13 (all) (all) 4.886576
That example uses the reshape2 package to handle the subtotals.
I think R is the right tool for the job here. I can totally understand not being sure how to get started on this analysis. I came to R from Excel a few years ago and it can be tough to grok at first. Let me point out four pro tips to help you get better answers in Stack Overflow:
1) provide data, even if simulated: you can see I simulated some data at the beginning of my answer. If you had provided that simulation it would have a) saved me time b) gotten you an answer that used your own data structure, not one I dreamed up and c) other people would have answered. I often skip questions with no data because I've grown tired of guessing about the data them being told my answer sucked because I guessed wrong.
2) Ask one clear question. "How do I do my work" is not a single clear question. "How do I take this example data and create subtotals in the aggregation like this example output" is a single specific question.
3) keep asking! We all get better with practice. You're trying to do more in R and less in Excel so you're clearly of above average intelligence. Keep using R and keep asking questions. It will all get easier in time.
4) Be careful with your words when you describe things. You say in your edited question you have a "list" of things. A list in R is a specific data structure. I'm suspicious you actually have a data frame and are using the term "list" in a generic sense. This can make for some confusion. It also illustrates why you want to provide your own data.
回答2:
Using JD Long's simulated data, and adding the sd and counts:
library(reshape) # not reshape2
cast(myDf.melt, school + teacher ~ ., margins=TRUE , c(mean, sd, length))
school teacher mean sd length
1 BSA1 Dick 4.663140 3.718773 14
2 BSA1 Harry 4.310802 1.430594 9
3 BSA1 Tom 5.505247 4.045846 4
4 BSA1 (all) 4.670451 3.095980 27
5 BSA2 Dick 6.110988 2.304104 15
6 BSA2 Harry 5.007221 2.908146 9
7 BSA2 Tom 4.337063 2.789244 14
8 BSA2 (all) 5.196018 2.682924 38
9 HSA1 Dick 4.508610 2.946961 11
10 HSA1 Harry 4.890741 2.977305 13
11 HSA1 Tom 4.721124 3.193576 11
12 HSA1 (all) 4.717335 2.950959 35
13 (all) (all) 4.886576 2.873637 100
回答3:
Below are several different ways of generating this using the relatively new pivottabler package.
Disclosure: I'm the package author.
For more information see the package page on CRAN and the various package vignettes available on that page.
Sample Data (same as above)
set.seed(1)
school <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
growth <- rnorm(100, 5, 3)
myDf <- data.frame(school, teacher, growth)
Quick pivot table output to console as plain text
library(pivottabler)
# arguments: qhpvt(dataFrame, rows, columns, calculations, ...)
qpvt(myDf, c("school", "teacher"), NULL,
c("Average Growth"="mean(growth)", "Std Dev"="sd(growth)",
"# of Scholars"="n()"),
formats=list("%.1f", "%.1f", "%.0f"))
Console Output:
Average Growth Std Dev # of Scholars
BSA1 Dick 4.7 3.7 14
Harry 4.3 1.4 9
Tom 5.5 4.0 4
Total 4.7 3.1 27
BSA2 Dick 6.1 2.3 15
Harry 5.0 2.9 9
Tom 4.3 2.8 14
Total 5.2 2.7 38
HSA1 Dick 4.5 2.9 11
Harry 4.9 3.0 13
Tom 4.7 3.2 11
Total 4.7 3.0 35
Total 4.9 2.9 100
Quick pivot table output as a html widget
library(pivottabler)
qhpvt(myDf, c("school", "teacher"), NULL,
c("Average Growth"="mean(growth)", "Std Dev"="sd(growth)",
"# of Scholars"="n()"),
formats=list("%.1f", "%.1f", "%.0f"))
HTML Widget Output:
Generating pivot table using more verbose syntax
This has more options, e.g. renaming totals.
library(pivottabler)
pt <- PivotTable$new()
pt$addData(myDf)
pt$addRowDataGroups("school", totalCaption="(all)")
pt$addRowDataGroups("teacher", totalCaption="(all)")
pt$defineCalculation(calculationName="c1", caption="Average Growth",
summariseExpression="mean(growth)", format="%.1f")
pt$defineCalculation(calculationName="c2", caption="Std Dev",
summariseExpression="sd(growth)", format="%.1f")
pt$defineCalculation(calculationName="c3", caption="# of Scholars",
summariseExpression="n()", format="%.0f")
pt # to output to console as plain text
pt$renderPivot() # to output as a html widget
HTML Widget Output:
回答4:
Sorry for autopromotion but take a look at my package expss.
Code for generating output below:
set.seed(1)
school <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
growth <- rnorm(100, 5, 3)
myDf <- data.frame(school, teacher, growth)
library(expss)
myDf %>%
# 'tab_cells' - variables on which statistics will be calculated
# "|" is needed to suppress 'growth' in row labels
tab_cells("|" = growth) %>%
# 'tab_cols' - variables for columns. Can be ommited
tab_cols(total(label = "")) %>%
# 'tab_rows' - variables for rows.
tab_rows(school %nest% list(teacher, "(All)"), "|" = "(All)") %>%
# 'method = list' is needed for statistics labels in column
tab_stat_fun("Average Growth" = mean,
"Std Dev" = sd,
"# of scholars" = length,
method = list) %>%
# finalize table
tab_pivot()
Code above gives object inherited from data.frame which can be used with standard R operations (subsetting with [
and etc.). But there is a special print
method for this object. Console output:
| | | Average Growth | Std Dev | # of scholars |
| ----- | ----- | -------------- | ------- | ------------- |
| BSA1 | Dick | 4.7 | 3.7 | 14 |
| | Harry | 4.3 | 1.4 | 9 |
| | Tom | 5.5 | 4.0 | 4 |
| | (All) | 4.7 | 3.1 | 27 |
| BSA2 | Dick | 6.1 | 2.3 | 15 |
| | Harry | 5.0 | 2.9 | 9 |
| | Tom | 4.3 | 2.8 | 14 |
| | (All) | 5.2 | 2.7 | 38 |
| HSA1 | Dick | 4.5 | 2.9 | 11 |
| | Harry | 4.9 | 3.0 | 13 |
| | Tom | 4.7 | 3.2 | 11 |
| | (All) | 4.7 | 3.0 | 35 |
| (All) | | 4.9 | 2.9 | 100 |
Output via htmlTable
in knitr, RStudio viewer or Shiny:
来源:https://stackoverflow.com/questions/6667091/pivot-table-like-output-in-r