ggplot2 Color Scale Over Affected by Outliers

后端 未结 3 1851
夕颜
夕颜 2021-02-13 23:22

I\'m having difficulty with a few outliers making the color scale useless.

My data has a Length variable that is based in a range, but will usually have a few much large

3条回答
  •  逝去的感伤
    2021-02-13 23:48

    Here's one slightly tricky options:

    #Create a new variable indicating the unusual values
    x$Length1 <- "> 1500"
    x$Length1[x$Length <= 1500] <- NA
    
    #main plot
    # Using fill - tricky!
    g <- ggplot() +
      geom_point(data = subset(x,Length <= 1500),
                 aes(x=date,y=factor(stateabbr),color=Length),size=4) + 
      geom_point(data = subset(x,Length > 1500),
                 aes(x=date,y=factor(stateabbr),fill=Length1),size=4)+
      opts(title="Date and State") + xlab("Date") + ylab("State")
    
    #problem
    g + scale_color_gradient2("Length",midpoint=median(x$Length))
    

    enter image description here

    So the tricky part here is using fill on points, in order to convince ggplot to make another legend. You can obviously customize this with different labels and colors for the fill scale.

    One more thing, reading Brandon's answer. You could in principle combine both approaches by taking the outlying values, using cut to create a separate categorical variable for them, and then use my trick with the fill scale. That way you could indicate multiple outlying groups of points.

提交回复
热议问题