问题
Just contextualizing, I work with psychometrics/psychological testing. I have a dataset formed of "points","percentile","z_real","z_normal","iq". I would like to have a single ggplot in which I could present the Z_score (from my raw data), the z_score (with an underlying normal distribution), and then have two supplementary x-axes with "raw score" and "iq scores". That's pretty common in statistics, as you can check it below
This is the current plot
This is the best solution I've got
That's the desired plot
I'm working with tidyverse and I would like to remain within it. Some previous posts helped me, such as this [one][5] and this [one][6].
Thank you. The (part of the) data and the codes are here:
ask_ds <- structure(list(points = c(17, 17, 2, 16, 11, 17, 20, 16, 19,
15, 9, 14, 14, 16, 13, 13, 22, 21, 25, 17, 17, 17, 20, 6, 11,
5, 10, 23, 21, 19, 11, 15, 13, 17, 17, 17, 9, 18, 12, 22, 21,
23, 8, 12, 6, 7, 22, 12, 21, 16, 12, 5, 19, 19, 21, 13, 12, 18,
22, 13, 21, 24, 23, 4, 12, 19, 2, 22, 14, 16, 24, 19, 7, 12,
18, 14, 9, 18, 7, 22, 3, 18, 14, 22, 15, 11, 15, 19, 25, 9, 19,
16, 13, 19, 14, 15, 20, 3, 23, 9, 7, 22, 9, 22, 17, 12, 14, 11,
7, 9, 19, 16, 19, 10, 24, 14, 12, 18, 19, 24, 22, 11, 11, 15,
17, 15, 11, 11, 14, 24, 14, 21, 16, 10, 19, 21, 15, 20, 15, 20,
22, 10, 22, 16, 14, 16, 8, 16, 9, 15), percentile = c(63, 61,
1, 56, 25, 59, 80, 55, 74, 49, 17, 45, 44, 57, 36, 32, 90, 85,
99, 63, 63, 60, 82, 7, 24, 5, 20, 93, 83, 75, 24, 50, 35, 64,
62, 61, 16, 65, 28, 90, 85, 93, 12, 28, 7, 8, 90, 26, 87, 55,
30, 4, 74, 73, 87, 33, 30, 67, 91, 35, 86, 95, 93, 3, 29, 75,
1, 89, 44, 55, 96, 75, 8, 27, 66, 41, 13, 68, 9, 91, 2, 65, 44,
89, 51, 22, 46, 74, 98, 13, 76, 58, 37, 73, 43, 50, 79, 2, 93,
17, 8, 91, 14, 92, 60, 26, 43, 25, 10, 14, 73, 57, 76, 18, 96,
40, 31, 70, 73, 96, 91, 24, 23, 47, 59, 51, 24, 22, 40, 96, 45,
85, 57, 17, 73, 86, 50, 79, 48, 81, 91, 21, 89, 53, 43, 58, 11,
53, 14, 47), z_real = structure(c(0.36, 0.36, -2.38, 0.18, -0.73,
0.36, 0.91, 0.18, 0.73, 0, -1.1, -0.19, -0.19, 0.18, -0.37, -0.37,
1.28, 1.09, 1.83, 0.36, 0.36, 0.36, 0.91, -1.65, -0.73, -1.83,
-0.92, 1.46, 1.09, 0.73, -0.73, 0, -0.37, 0.36, 0.36, 0.36, -1.1,
0.55, -0.55, 1.28, 1.09, 1.46, -1.28, -0.55, -1.65, -1.47, 1.28,
-0.55, 1.09, 0.18, -0.55, -1.83, 0.73, 0.73, 1.09, -0.37, -0.55,
0.55, 1.28, -0.37, 1.09, 1.64, 1.46, -2.01, -0.55, 0.73, -2.38,
1.28, -0.19, 0.18, 1.64, 0.73, -1.47, -0.55, 0.55, -0.19, -1.1,
0.55, -1.47, 1.28, -2.2, 0.55, -0.19, 1.28, 0, -0.73, 0, 0.73,
1.83, -1.1, 0.73, 0.18, -0.37, 0.73, -0.19, 0, 0.91, -2.2, 1.46,
-1.1, -1.47, 1.28, -1.1, 1.28, 0.36, -0.55, -0.19, -0.73, -1.47,
-1.1, 0.73, 0.18, 0.73, -0.92, 1.64, -0.19, -0.55, 0.55, 0.73,
1.64, 1.28, -0.73, -0.73, 0, 0.36, 0, -0.73, -0.73, -0.19, 1.64,
-0.19, 1.09, 0.18, -0.92, 0.73, 1.09, 0, 0.91, 0, 0.91, 1.28,
-0.92, 1.28, 0.18, -0.19, 0.18, -1.28, 0.18, -1.1, 0), .Dim = c(150L,
1L), "`\`\`\`scaled:center\`\`\``" = 15.0143288084465, "`\`\`\`scaled:scale\`\`\``" = 5.47051980922509),
z_normal = c(0.33, 0.28, -2.33, 0.15, -0.67, 0.23, 0.84,
0.13, 0.64, -0.03, -0.95, -0.13, -0.15, 0.18, -0.36, -0.47,
1.28, 1.04, 2.33, 0.33, 0.33, 0.25, 0.92, -1.48, -0.71, -1.64,
-0.84, 1.48, 0.95, 0.67, -0.71, 0, -0.39, 0.36, 0.31, 0.28,
-0.99, 0.39, -0.58, 1.28, 1.04, 1.48, -1.17, -0.58, -1.48,
-1.41, 1.28, -0.64, 1.13, 0.13, -0.52, -1.75, 0.64, 0.61,
1.13, -0.44, -0.52, 0.44, 1.34, -0.39, 1.08, 1.64, 1.48,
-1.88, -0.55, 0.67, -2.33, 1.23, -0.15, 0.13, 1.75, 0.67,
-1.41, -0.61, 0.41, -0.23, -1.13, 0.47, -1.34, 1.34, -2.05,
0.39, -0.15, 1.23, 0.03, -0.77, -0.1, 0.64, 2.05, -1.13,
0.71, 0.2, -0.33, 0.61, -0.18, 0, 0.81, -2.05, 1.48, -0.95,
-1.41, 1.34, -1.08, 1.41, 0.25, -0.64, -0.18, -0.67, -1.28,
-1.08, 0.61, 0.18, 0.71, -0.92, 1.75, -0.25, -0.5, 0.52,
0.61, 1.75, 1.34, -0.71, -0.74, -0.08, 0.23, 0.03, -0.71,
-0.77, -0.25, 1.75, -0.13, 1.04, 0.18, -0.95, 0.61, 1.08,
0, 0.81, -0.05, 0.88, 1.34, -0.81, 1.23, 0.08, -0.18, 0.2,
-1.23, 0.08, -1.08, -0.08), iq = c(104.98, 104.19, 65.1,
102.26, 89.88, 103.41, 112.62, 101.88, 109.65, 99.62, 85.69,
98.12, 97.74, 102.65, 94.62, 92.98, 119.22, 115.55, 134.9,
104.98, 104.98, 103.8, 113.73, 77.86, 89.41, 75.33, 87.38,
122.14, 114.31, 110.12, 89.41, 100, 94.22, 105.38, 104.58,
104.19, 85.08, 105.78, 91.26, 119.22, 115.55, 122.14, 82.38,
91.26, 77.86, 78.92, 119.22, 90.35, 116.9, 101.88, 92.13,
73.74, 109.65, 109.19, 116.9, 93.4, 92.13, 106.6, 120.11,
94.22, 116.2, 124.67, 122.14, 71.79, 91.7, 110.12, 65.1,
118.4, 97.74, 101.88, 126.26, 110.12, 78.92, 90.81, 106.19,
96.59, 83.1, 107.02, 79.89, 120.11, 69.19, 105.78, 97.74,
118.4, 100.38, 88.42, 98.49, 109.65, 130.81, 83.1, 110.59,
103.03, 95.02, 109.19, 97.35, 100, 112.1, 69.19, 122.14,
85.69, 78.92, 120.11, 83.8, 121.08, 103.8, 90.35, 97.35,
89.88, 80.78, 83.8, 109.19, 102.65, 110.59, 86.27, 126.26,
96.2, 92.56, 107.87, 109.19, 126.26, 120.11, 89.41, 88.92,
98.87, 103.41, 100.38, 89.41, 88.42, 96.2, 126.26, 98.12,
115.55, 102.65, 85.69, 109.19, 116.2, 100, 112.1, 99.25,
113.17, 120.11, 87.9, 118.4, 101.13, 97.35, 103.03, 81.6,
101.13, 83.8, 98.87)), row.names = c(NA, -150L), class = c("tbl_df",
"tbl", "data.frame"))
ask_ds %>%
select(z_real, z_normal) %>%
pivot_longer(everything()) %>%
ggplot(., aes(value, fill = name)) +
geom_density(alpha=.2) +
scale_x_continuous(breaks=-3:3, labels=parse(text=paste(-3:3, '*sigma')) ,
"Standard deviation",
sec.axis = sec_axis(trans = ~.* sd(ask_ds$points) +
mean(ask_ds$points),
"Raw score"))
gridExtra::grid.arrange(ask_ds %>%
select(z_real, z_normal) %>%
pivot_longer(everything()) %>%
ggplot(., aes(value, fill = name)) +
geom_density(alpha=.2) ,
ggplot(ask_ds, aes(points)) +
geom_density()
)
回答1:
I hadn't known about the patchwork
thing - that's really cool, thanks @teunbrand. I have a less elegant solution, but thought I would post it just the same. Here, you're making the axes by hand inside the plotting region.
## turn your data into a stand alone dataset
plot.dat <- ask_ds %>%
select(z_real, z_normal) %>%
pivot_longer(everything())
## identify the values for the x-axis
xv <- c(-2, -1, 0, 1, 2)
## identify the middle values for the y-tick marks
yv1 <- c(-.025,-.025,-.025,-.025,-.025)
yv2 <- c(-.065,-.065,-.065,-.065,-.065)
## identify the transformations for the tick mark labels on other axes
rv <- xv* sd(ask_ds$points) + mean(ask_ds$points)
iq <- xv*15 + 100
## make the plot
ggplot() +
geom_density(data=plot.dat, aes(value, fill = name), alpha=.2) +
## add an abline that will serve as the raw data axis
geom_abline(slope=0, intercept=-.025) +
## add segments for the tick marks
geom_segment(aes(x=xv, y=yv1 - .005, xend=xv, yend=yv1 + .005)) +
## add the text labels
geom_text(aes(x=xv, y = yv1 - .011, label=sprintf("%.2f", rv))) +
## repeat a bit lower for the IQ axis
geom_abline(slope=0, intercept=-.065) +
geom_segment(aes(x=xv, y=yv2 - .005, xend=xv, yend=yv2 + .005)) +
geom_text(aes(x=xv, y = yv2 - .011, label=sprintf("%.2f", iq))) +
## format the actual x-axis
scale_x_continuous(breaks=-3:3, labels=parse(text=paste(-3:3, '*sigma')) ,
"Standard deviation") +
## add a y-axis to identify the two new scales.
scale_y_continuous(sec.axis = sec_axis(~.,
breaks = c(-.025, -.065),
labels = c("Raw Data", "IQ"))) +
## put the legend on top - this keeps it from being pushed aside for the second
## y-axis labels.
theme(legend.position="top")
回答2:
I think the core of the question here is how to get multiple x-axes, so I'm going to answer that question instead of trying to understand what your data is about. There is no 'proper' way to introduce multiple x-axis in ggplot2, however, you can work around this by putting a bunch of miniscule height plots below each other. For this I recommend the patchwork package.
We first make the plot that we want to illustrate, I'll do this with some dummy data:
library(ggplot2)
library(patchwork)
set.seed(0)
df <- data.frame(
x = rnorm(100, 100, 15)
)
dens <- ggplot(df, aes(x)) +
geom_density() +
labs(x = "IQ") +
theme(axis.line.x = element_line())
Next, we'll make two dummy plots where the x-axis will be trained, but no geom is drawn. Also the aspect ratio is set to a small number, to make it appear as if this is just the axis line and no plot occured. This generally only works whenever there is a linear translation from one axis to the next. I'm no psychometrician but I thought that IQ was roughly ~N(100, 15)
, so we need to inverse transform this to the standard normal for a Z-score.
zscore_axis <- ggplot(df, aes((x - 100) / 15)) +
labs(x = "Z-score") +
theme(axis.line.x = element_line(),
aspect.ratio = 1e-8)
quantile_axis <- ggplot(df, aes((x - 100) / 15)) +
labs(x = "Quantile") +
scale_x_continuous(breaks = qnorm(seq(0.1, 0.9, by = 0.1)),
labels = seq(0.1, 0.9, by = 0.1)) +
theme(axis.line.x = element_line(),
aspect.ratio = 1e-8)
Lastly, with patchwork magic we place all three plots below oneanother.
dens / zscore_axis / quantile_axis
Created on 2020-09-17 by the reprex package (v0.3.0)
So that is how I would make multiple x-axis. How this would play with your data, I'm not exactly sure. You can adapt to your own metrics, such as raw score, as long as they are a linear transformation of the data.
来源:https://stackoverflow.com/questions/63943336/multi-x-axis-using-ggplot-to-present-z-scores-iq-scores-and-raw-data