问题
I want to visualize the results of a linear model where dependent variable values change as a function of discrete x-values. Since my x-values represent consecutive days, I want to annotate the change from day to day, in percents. How can I do this in a line plot?
My Data
I want to measure people's mood. Every day I collect responses from 1000 different people on how they feel. I therefore get a daily average for mood, and I want to see how it changes from one day to another.library(tidyverse)
library(emmeans)
day_1 <- rnorm(1000, mean = 77, sd = 18)
day_2 <- rnorm(1000, mean = 74, sd = 19)
day_3 <- rnorm(1000, mean = 80, sd = 5)
day_4 <- rnorm(1000, mean = 76, sd = 18)
df <-
cbind(day_1, day_2, day_3, day_4) %>%
as.tibble() %>%
gather(., key = day, value = mood, day_1:day_4) %>%
mutate_at(vars(day), factor)
> df
## # A tibble: 4,000 x 2
## day mood
## <fct> <dbl>
## 1 day_1 83.9
## 2 day_1 94.9
## 3 day_1 104.
## 4 day_1 81.0
## 5 day_1 61.4
## 6 day_1 95.1
## 7 day_1 78.6
## 8 day_1 108.
## 9 day_1 74.7
## 10 day_1 79.7
## # ... with 3,990 more rows
Fitting and plotting
fit <- lm(formula = mood ~ day, data = df)
emmip(fit, ~ day, CIs = TRUE)
Given that the plot object can be edited using ggplot functions, how can I add the change between days, in percents, such as the following illustration?
Is there an efficient way to calculate the change and put it above each section of the line?
回答1:
The following approach utilizes ggplot_build()
(included from ggplot2
itself) to pull out the underlying data used to create your plot, then geom_label()
to perform the annotation itself.
Preparations
As indicated, we can use ggplot_build()
to pull the data from your dataset.
p <- emmip(fit, ~ day, CIs = TRUE) # save your plot as gg object
plotdata <- ggplot_build(p)$data[[1]]
There's kind of a lot going on there in the ggplot_build()
function, so I'll explain. We want to access the data
part of the result, and when you do that you get the datasets used to create each of the layers. In the plot, you have 3 layers: the points, the lines, and the bars for the CI. In principle, you can pull any of those, but I'm choosing the first one ([[1]]
). In particular, we want to access the y
values.
To calculate the percent change, I have written a small function to do this for us that uses diff()
. Since diff()
does not return a "0" for the first index, we have to add that. Then we add the column to plotdata
:
percent_change <- function(x) {
p_change <- (diff(x)/x[1:length(x)-1])*100
return(c(0,p_change)) # add back the 0 for the first index
}
plotdata$change <- percent_change(plotdata$y)
Plotting
Now we're ready for the plot. We'll add a label geom to the plot, p
. There's a few things going on in there:
Filtering to use only the
plotdata
parts whereplotdata$change != 0
. This is because we don't want to label any points where there is no change (i.e. the first point).I need to add a "+" preceding positive values of
plotdata$change
.ifelse()
within the label aesthetic seems to work just fine.color can be dynamically changed here. You could also map it via
aes()
, but I would need to create another column and so it's just convenient here to useifelse()
to control the color as red or green, since there's only two options. You have to do this outside of theaes()
, otherwise you will only get a legend and defaultggplot2
colors for the labels "red" and "green". No legend is created the way I do it here.
Code and plot here:
p + geom_label(
data=subset(plotdata, change != 0),
aes(x=x, y=y,
label=paste0(ifelse(
subset(plotdata, change!=0)$change <0, '','+'),
round(change, 2),'%')),
color=ifelse(subset(plotdata, change!=0)$change <0, 'red','green3'),
nudge_x = -0.3
)
来源:https://stackoverflow.com/questions/63049622/how-to-add-annotation-over-line-plot-to-mark-percent-change-in-y-values-between