Adding text annotation to a clustering scatter plot (tSNE)

北城余情 提交于 2019-12-21 18:56:10

问题


I have XY data (a 2D tSNE embedding of high dimensional data) which I'd like to scatter plot. The data are assigned to several clusters, so I'd like to color code the points by cluster and then add a single label for each cluster, that has the same color coding as the clusters, and is located outside (as much as possible) from the cluster's points.

Any idea how to do this using R in either ggplot2 and ggrepel or plotly?

Here's the example data (the XY coordinates and cluster assignments are in df and the labels in label.df) and the ggplot2 part of it:

library(dplyr)
library(ggplot2)
set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)

label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))

ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none")


回答1:


The geom_label_repel() function in the ggrepel package allows you to easily add labels to plots while trying to "repel" the labels from not overlapping with other elements. A slight addition to your existing code where we summarize the data / get coordinates of where to put the labels (here I chose the upper left'ish region of each cluster - which is the min of x and the max of y) and merge it with your existing data containing the cluster labels. Specify this data frame in the call to geom_label_repel() and specify the variable that contains the label aesthetic in aes().

library(dplyr)
library(ggplot2)
library(ggrepel)

set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)

label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))
label.df_2 <- df %>% 
  group_by(cluster) %>% 
  summarize(x = min(x), y = max(y)) %>% 
  left_join(label.df)

ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none") +
  ggrepel::geom_label_repel(data = label.df_2, aes(label = label))



来源:https://stackoverflow.com/questions/51120412/adding-text-annotation-to-a-clustering-scatter-plot-tsne

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!