问题
I am having trouble producing a figure in R using ggplots. No stats are needed - I just need the visual representation of my data. I have 7 participants, and I want to plot a line for each participant through a scatterplot. The slope and shape of the line is different for each participant, however on average is somewhat exponential.
I have used the below code in R, however I am only getting liner models. When changing the method to loess, the lines are too wriggly. Can someone please help me make this more presentable? Essentially I'm after a line of best fit for each participant, yet still need to be able to use the function fullrange = FALSE.
Furthermore, should I be using stat_smooth or geom_smooth? Is there a difference.
ggplot(data, aes(x=x, y=y, group = athlete)) +
geom_point() +
stat_smooth(method = "lm", se=FALSE, fullrange = FALSE)
Thanks in advance for any help!
回答1:
I don't have your data, so I'll just do this with the mpg
dataset.
As you've noted, you can use geom_smooth()
and specify a method such as "loess"
. Know that you can pass on arguments to the methods as you would if you were using the function behind it.
With loess
, the smoothing parameter is span
. You can play around with this until you're happy with the results.
data(mpg)
g <- ggplot(mpg, aes(x = displ, y = hwy, color = class)) + geom_point()
g + geom_smooth(se = F, method = 'loess', span = .8) + ggtitle("span 0.8")
g + geom_smooth(se = F, method = 'loess', span = 1) + ggtitle("span 1")
回答2:
There is, to my knowledge, no built-in method for achieving this, but you can do it with some manual plotting. First, since you expect an exponential relationship, it might make sense to run a linear regression using log(y)
as the response (I'll be using u
and v
, in order not to confuse them with the x
and y
aesthetics in the graph):
tb1 = tibble(
u = rep(runif(100, 0, 5), 3),
a = c(rep(-.5, 100), rep(-1, 100), rep(-2, 100)),
v = exp(a*u + rnorm(3*100, 0, .1))
) %>% mutate(a = as.factor(a))
lm1 = lm(log(v) ~ a:u, tb1)
summary(lm1)
gives you:
Call:
lm(formula = log(v) ~ a:u, data = tb1)
Residuals:
Min 1Q Median 3Q Max
-0.263057 -0.069510 -0.001262 0.062407 0.301033
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.013696 0.012234 -1.12 0.264
a-2:u -1.996670 0.004979 -401.04 <2e-16 ***
a-1:u -1.001412 0.004979 -201.14 <2e-16 ***
a-0.5:u -0.495636 0.004979 -99.55 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1002 on 296 degrees of freedom
Multiple R-squared: 0.9984, Adjusted R-squared: 0.9983
F-statistic: 6.025e+04 on 3 and 296 DF, p-value: < 2.2e-16
Under "Coefficients" you can find the intercept and the "slopes" for the curves (actually the exponential factors). You can see that they closely match the factors we used for generating the data.
To plot the fitting curves, you can use the "predicted" values, produced from your linear model using predict
:
ggplot(tb1, aes(u, v, colour=a)) +
geom_point() +
geom_line(data=tb1 %>% mutate(v = exp(predict(lm1))))
If you want to have the standard error ribbons, it's a little more work, but still possible:
p1 = predict(lm1, se.fit=T)
tb2 = tibble(
u = tb1$u,
a = tb1$a,
v = exp(p1$fit),
vmin = exp(p1$fit - 1.96*p1$se.fit),
vmax = exp(p1$fit + 1.96*p1$se.fit)
)
ggplot(tb2, aes(u, v, colour=a)) +
geom_ribbon(aes(fill=a, ymin=vmin, ymax=vmax), colour=NA, alpha=.25) +
geom_line(size=.5) +
geom_point(data=tb1)
produces:
来源:https://stackoverflow.com/questions/58873060/using-ggplot2-in-r-creating-multiple-smoothed-fitted-lines