Jitter dots without overlap

前提是你 提交于 2020-01-14 04:01:07

问题


My data:

a <- sample(1:5, 100, replace = TRUE)
b <- sample(1:5, 100, replace = TRUE)
c <- sample(1:10, 100, replace = TRUE)
d <- sample(1:40, 100, replace = TRUE)
df <- data.frame(a, b, c, d)

Using ggplot2, I have created scatterplot over x = a and y = b, weighted in two dimension (by colour = c and size = d). Note that x and y are intentionally 1:5.

Obviously, the points of different sizes and colors therefore overlap, so I tried jitter to avoid overlapping:

ggplot(df, aes(a, b, colour = c, size = d)) + 
  geom_point(position = position_jitter())

Now I would like the dots clustering closer together, so I tried several combinations of height and width for the jitter function, such as

ggplot(df, aes(a, b, colour = c, size = d)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0.2))

Jitter makes the dots still overlap and also distributes them to randomly on the given area.

Is there a way to have the dots not overlapping at all, yet clustered as close together as possible, maybe even touching and also not "side by side" or stacked? (In a way, creating kind of bubbles with smaller dots)?

Thanks!


回答1:


According to @Tjebo's suggestions I have arranged dots in "heaps".

set.seed(1234)
n <- 100
a <- sample(1:5,n,rep=TRUE)
b <- sample(1:5,n,rep=TRUE)
c <- sample(1:10,n,rep=TRUE)
d <- sample(1:40,n,rep=TRUE)
df0 <- data.frame(a,b,c,d)

# These parameters need carefully tuning
minr <- 0.05
maxr <- 0.2
# Order circles by dimension
ord <- FALSE

df1 <- df0
df1$d <- minr+(maxr-minr)*(df1$d-min(df1$d))/(max(df1$d)-min(df1$d))
avals <- unique(df1$a)
bvals <- unique(df1$b)

for (k1 in seq_along(avals)) {
  for (k2 in seq_along(bvals)) {
  print(paste(k1,k2))
    subk <- (df1$a==avals[k1] & df1$b==bvals[k2])
    if (sum(subk)>1) {
      subdfk <- df1[subk,]
      if (ord) {
        idx <- order(subdfk$d)
        subdfk <- subdfk[idx,]
      }
      subdfk.mod <- subdfk
      posmx <- which.max(subdfk$d)   
      subdfk1 <- subdfk[posmx,]
      subdfk2  <- subdfk[-posmx,]
      angsk <- seq(0,2*pi,length.out=nrow(subdfk2)+1)
      subdfk2$a <- subdfk2$a+cos(angsk[-length(angsk)])*(subdfk1$d+subdfk2$d)/2
      subdfk2$b <- subdfk2$b+sin(angsk[-length(angsk)])*(subdfk1$d+subdfk2$d)/2
      subdfk.mod[posmx,] <- subdfk1
      subdfk.mod[-posmx,] <- subdfk2
      df1[subk,] <- subdfk.mod
    }
  }
}

library(ggplot2)
library(ggforce)
ggplot(df1, aes()) + 
  geom_circle(aes(x0=a, y0=b, r=d/2, fill=c), alpha=0.7)+ coord_fixed()




回答2:


An interesting visualization tool is the beeswarm plot.
In R the beeswarm and the ggbeeswarm packages implement this kind of plot.

Here is an example with ggbeeswarm:

set.seed(1234)
a <- sample(1:5,100,rep=TRUE)
b <- sample(1:5,100,rep=TRUE)
c <- sample(1:10,100,rep=TRUE)
d <- sample(1:40,100,rep=TRUE)
df <- data.frame(a,b,c,d)
library(ggbeeswarm)
ggplot(aes(x=a, y=b, col=c, size=d),  data = df)+
  geom_beeswarm(priority='random',cex=3.5, groupOnX=T)+coord_flip()

I hope this can help you.




回答3:


Here is another possibile solution to the jittering problem of @Tjebo.
The parameter dst needs some tuning.

set.seed(1234)
a <- sample(1:5,100,rep=TRUE)
b <- sample(1:5,100,rep=TRUE)
c <- sample(1:10,100,rep=TRUE)
d <- sample(1:40,100,rep=TRUE)
df <- data.frame(a,b,c,d)

dst <- .2

df.mod <- df
avals <- unique(df$a)
bvals <- unique(df$b)
for (k1 in seq_along(avals)) {
  for (k2 in seq_along(bvals)) {
    subk <- (df$a==avals[k1] & df$b==bvals[k2])
    if (sum(subk)>1) {
      subdf <- df[subk,]
      angsk <- seq(0,2*pi,length.out=nrow(subdf)+1)
      ak <- subdf$a+cos(angsk[-1])*dst
      bk <- subdf$b+sin(angsk[-1])*dst
      df.mod[subk,c("a","b")] <- cbind(ak,bk)
    }
  }
}

library(ggplot2)
ggplot(df.mod, aes(a, b, colour = c, size = d)) + geom_point()



来源:https://stackoverflow.com/questions/43698290/jitter-dots-without-overlap

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!