Conditionally colour data points outside of confidence bands in R

后端未结

关注

 3  1383

梦毁少年i 2021-02-03 14:13

I need to colour datapoints that are outside of the the confidence bands on the plot below differently from those within the bands. Should I add a separate column to my dataset

3条回答

挽巷 (楼主)

2021-02-03 14:38
The easiest way is probably to calculate a vector of TRUE/FALSE values that indicate if a data point is inside of the confidence interval or not. I'm going to reshuffle your example a little bit so that all of the calculations are completed before the plotting commands are executed- this provides a clean separation in the program logic that could be exploited if you were to package some of this into a function.

The first part is pretty much the same, except I replaced the additional call to lm() inside predict() with the severity.lm variable- there is no need to use additional computing resources to recalculate the linear model when we already have it stored:
```
## Dataset from 
#  apsnet.org/education/advancedplantpath/topics/
#    RModules/doc1/04_Linear_regression.html

## Disease severity as a function of temperature

# Response variable, disease severity
diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4,7.6,9.8,12.4)

# Predictor variable, (Centigrade)
temperature<-c(2,1,5,5,20,20,23,10,30,25)

## For convenience, the data may be formatted into a dataframe
severity <- as.data.frame(cbind(diseasesev,temperature))

## Fit a linear model for the data and summarize the output from function lm()
severity.lm <- lm(diseasesev~temperature,data=severity)

## Get datapoints predicted by best fit line and confidence bands
## at every 0.01 interval
xRange=data.frame(temperature=seq(min(temperature),max(temperature),0.01))
pred4plot <- predict(
  severity.lm,
  xRange,
  level=0.95,
  interval="confidence"
)
```
Now, we'll calculate the confidence intervals for the origional data points and run a test to see if the points are inside the interval:
```
modelConfInt <- predict(
  severity.lm,
  level = 0.95,
  interval = "confidence"
)

insideInterval <- modelConfInt[,'lwr'] < severity[['diseasesev']] &
  severity[['diseasesev']] < modelConfInt[,'upr']
```
Then we'll do the plot- first a the high-level plotting function plot(), as you used it in your example, but we will only plot the points inside the interval. We will then follow up with the low-level function points() which will plot all the points outside the interval in a different color. Finally, matplot() will be used to fill in the confidence intervals as you used it. However instead of calling par(new=TRUE) I prefer to pass the argument add=TRUE to high-level functions to make them act like low level functions.

Using par(new=TRUE) is like playing a dirty trick a plotting function- which can have unforeseen consequences. The add argument is provided by many functions to cause them to add information to a plot rather than redraw it- I would recommend exploiting this argument whenever possible and fall back on par() manipulations as a last resort.
```
# Take a look at the data- those points inside the interval
plot(
  diseasesev~temperature,
  data=severity[ insideInterval,],
  xlab="Temperature",
  ylab="% Disease Severity",
  pch=16,
  pty="s",
  xlim=c(0,30),
  ylim=c(0,30)
)
title(main="Graph of % Disease Severity vs Temperature")

# Add points outside the interval, color differently
points(
  diseasesev~temperature,
  pch = 16,
  col = 'red',
  data = severity[ !insideInterval,]
)

# Add regression line and confidence intervals
matplot(
  xRange,
  pred4plot,
  lty=c(1,2,2),   #vector of line types and widths
  type="l",       #type of plot for each column of y
  add = TRUE
)
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...