geom_smooth on a subset of data

后端 未结 2 834
半阙折子戏
半阙折子戏 2020-12-20 13:34

Here is some data and a plot:

set.seed(18)
data = data.frame(y=c(rep(0:1,3),rnorm(18,mean=0.5,sd=0.1)),colour=rep(1:2,12),x=rep(1:4,each=6))

ggplot(data,aes         


        
相关标签:
2条回答
  • 2020-12-20 13:42

    It's as simple as geom_smooth(data=subset(data, x >= 2), ...). It's not important if this plot is just for yourself, but realize that something like this would be misleading to others if you don't include a mention of how the regression was performed. I'd recommend changing transparency of the points excluded.

    ggplot(data,aes(x=x,y=y,colour=factor(colour)))+
    geom_point(data=subset(data, x >= 2)) + geom_point(data=subset(data, x < 2), alpha=.2) +
    geom_smooth(data=subset(data, x >= 2), method='lm',formula=y~x,se=F)
    

    enter image description here

    0 讨论(0)
  • 2020-12-20 13:47

    The regular lm function has a weights argument which you can use to assign a weight to a particular observation. In this way you can plain with the influence which the observation has on the outcome. I think this is a general way of dealing with the problem in stead of subsetting the data. Of course, assigning weights ad hoc does not bode well for the statistical soundness of the analysis. It is always best to have a rationale behind the weights, e.g. low weight observations have a higher uncertainty.

    I think under the hood ggplot2 uses the lm function so you should be able to pass the weights argument. You can add the weights through the aesthetic (aes), assuming that the weight is stored in a vector:

    ggplot(data,aes(x=x,y=y,colour=factor(colour))) + 
        geom_point()+ stat_smooth(aes(weight = runif(nrow(data))), method='lm')
    

    you could also put weight in a column in the dataset:

    ggplot(data,aes(x=x,y=y,colour=factor(colour))) + 
        geom_point()+ stat_smooth(aes(weight = weight), method='lm')
    

    where the column is called weight.

    0 讨论(0)
提交回复
热议问题