Is it possible to jitter two ggplot geoms in the same way?

前端 未结 3 687
野的像风
野的像风 2020-12-15 20:42

Using position_jitter creates random jitter to prevent overplotting of data points.

In the below I have used the example of baseball statistics to illustrate my prob

相关标签:
3条回答
  • 2020-12-15 20:48

    I think so, by setting the seed to be the same in the two instances:

    p=ggplot(baseball,aes(x=round(year,-1),y=sb,color=factor(lg)))
    myseed = 2010
    set.seed(myseed)
    p=p+stat_summary(fun.data="mean_cl_normal",
      position=position_jitter(width=3,height=0))+coord_cartesian(ylim=c(0,40))
    set.seed(myseed)
    p+stat_summary(fun.y=mean,geom="line",
               position=position_jitter(width=3,height=0))
    

    This ensures that the random number generator is sent back to the same starting position as was used in the initial call. However I don't know how you could extract the random increments added to the values.

    0 讨论(0)
  • 2020-12-15 21:09

    This is a weakness in the current ggplot2 syntax - there's no way to work around it except to add the jitter yourself.

    Or you could do something like this:

    ggplot(baseball, aes(round(year,-1) + as.numeric(factor(lg)), sb, color = factor(lg))) +
      stat_summary(fun.data="mean_cl_normal") +
      stat_summary(fun.y=mean,geom="line") +
      coord_cartesian(ylim=c(0,40))
    
    0 讨论(0)
  • 2020-12-15 21:11

    I ended up generating a uniform distribution to solve this problem.

    I had to address the same underlying problem today. I create one plot, jittering the points, and then I create a second plot that essentially zooms in on a subsection of the first. It's dissonant and distracting if the points move around.

    Following is a demo of the problem and my solution. I don't use ggplot for this plot, but the same concept applies. I make a uniform distribution, one value for each value I need to jitter. I add it to the source dataframe so that each time I take a subset, the jitter value corresponds to the same original data value.

    data(airquality)
    someDataset= airquality 
    someDataset$color="black"
    someDataset$color[someDataset$Month==8 & someDataset$Wind==9.7]="red"
    ## jitter gives different results each time it's run
    for (fZoom in c(TRUE, FALSE)){
        if (fZoom) myAirQuality = someDataset[someDataset $Wind >7.5 & someDataset $Wind < 11.5,] 
        else myAirQuality = someDataset[someDataset $Wind >8.5 & someDataset $Wind < 10.5,]
        quartz("Using Jitter")
        plot(myAirQuality $Wind ~ jitter(myAirQuality $Month), col= myAirQuality$color)
        }
    
    someDataset$MonthJit=runif(nrow(someDataset), min=-0.2, max=0.2)
    for (fZoom in c(TRUE, FALSE)){
        if (fZoom) myAirQuality = someDataset[someDataset $Wind >7.5 & someDataset $Wind < 11.5,] 
        else myAirQuality = someDataset[someDataset $Wind >8.5 & someDataset $Wind < 10.5,]
        quartz("Using runif")
        plot(myAirQuality $Wind ~ c(myAirQuality $Month + myAirQuality $MonthJit), col= myAirQuality$color)
        }
    
    0 讨论(0)
提交回复
热议问题