We have the diameter of trees as the predictor and tree height as the dependent variable. A number of different equations exist for this kind of data and we try to model som
You've got a couple problems. (1) You're missing parentheses for the denominator of form2
(and R has no way to know that you want to add a constant a
in the denominator, or where to put any of the parameters, really), and much more problematic: (2) your 2nd model isn't linear, so lm
won't work.
Fixing (1) is easy:
form2 <- h ~ 1.3 + I(dbh^2) / (a + b * dbh + c * I(dbh^2))
Fixing (2), though there are many ways to estimate parameters for a nonlinear model, the nls
(nonlinear least squares) is a good place to start:
m2 <- nls(form2, data = df, start = list(a = 1, b = 1, c = 1))
You need to provide starting guesses for the parameters in nls
. I just picked 1's, but you should use better guesses that ballpark what the parameters might be.
edit: fixed, no longer incorrectly using offset ...
An answer that complements @shujaa's:
You can transform your problem from
H = 1.3 + D^2/(a+b*D+c*D^2)
to
1/(H-1.3) = a/D^2+b/D+c
This would normally mess up the assumptions of the model (i.e., if H
were normally distributed with constant variance, then 1/(H-1.3)
wouldn't be. However, let's try it anyway:
data(trees)
df <- transform(trees,
h=Height * 0.3048, #transform to metric system
dbh=Girth * 0.3048 / pi #transform tree girth to diameter
)
lm(1/(h-1.3) ~ poly(I(1/dbh),2,raw=TRUE),data=df)
## Coefficients:
## (Intercept) poly(I(1/dbh), 2, raw = TRUE)1
## 0.043502 -0.006136
## poly(I(1/dbh), 2, raw = TRUE)2
## 0.010792
These results would normally be good enough to get good starting values for the nls
fit. However, you can do better than that via glm
, which uses a link function to allow for some forms of non-linearity. Specifically,
(fit2 <- glm(h-1.3 ~ poly(I(1/dbh),2,raw=TRUE),
family=gaussian(link="inverse"),data=df))
## Coefficients:
## (Intercept) poly(I(1/dbh), 2, raw = TRUE)1
## 0.041795 -0.002119
## poly(I(1/dbh), 2, raw = TRUE)2
## 0.008175
##
## Degrees of Freedom: 30 Total (i.e. Null); 28 Residual
## Null Deviance: 113.2
## Residual Deviance: 80.05 AIC: 125.4
##
You can see that the results are approximately the same as the linear fit, but not quite.
pframe <- data.frame(dbh=seq(0.8,2,length=51))
We use predict
, but need to correct the prediction to account for the fact that we subtracted a constant from the LHS:
pframe$h <- predict(fit2,newdata=pframe,type="response")+1.3
p2 <- predict(fit2,newdata=pframe,se.fit=TRUE) ## predict on link scale
pframe$h_lwr <- with(p2,1/(fit+1.96*se.fit))+1.3
pframe$h_upr <- with(p2,1/(fit-1.96*se.fit))+1.3
png("dbh_tmp1.png",height=4,width=6,units="in",res=150)
par(las=1,bty="l")
plot(h~dbh,data=df)
with(pframe,lines(dbh,h,col=2))
with(pframe,polygon(c(dbh,rev(dbh)),c(h_lwr,rev(h_upr)),
border=NA,col=adjustcolor("black",alpha=0.3)))
dev.off()
Because we have used the constant on the LHS (this almost, but doesn't quite, fit into the framework of using an offset -- we could only use an offset if our formula were 1/H - 1.3 = a/D^2 + ...
, i.e. if the constant adjustment were on the link (inverse) scale rather than the original scale), this doesn't fit perfectly into ggplot
's geom_smooth
framework
library("ggplot2")
ggplot(df,aes(dbh,h))+geom_point()+theme_bw()+
geom_line(data=pframe,colour="red")+
geom_ribbon(data=pframe,colour=NA,alpha=0.3,
aes(ymin=h_lwr,ymax=h_upr))
ggsave("dbh_tmp2.png",height=4,width=6)
Assuming you are using nls
the R formula can use an ordinary R function, H(a, b, c, D)
, so the formula can be just h ~ H(a, b, c, dbh)
and this works:
# use lm to get startingf values
lm1 <- lm(1/(h - 1.3) ~ I(1/dbh) + I(1/dbh^2), df)
start <- rev(setNames(coef(lm1), c("c", "b", "a")))
# run nls
H <- function(a, b, c, D) 1.3 + D^2 / (a + b * D + c * D^2)
nls1 <- nls(h ~ H(a, b, c, dbh), df, start = start)
nls1 # display result
Graphing the output:
plot(h ~ dbh, df)
lines(fitted(nls1) ~ dbh, df)