I have a few datapoints (x and y) that seem to have a logarithmic relationship.
> mydata
x y
1 0 123
2 2 116
3 4 113
4 15 100
5 48 87
6 75 84
7 122 77
> qplot(x, y, data=mydata, geom="line")
Now I would like to find an underlying function that fits the graph and allows me to infer other datapoints (i.e. 3
or 82
). I read about lm
and nls
but I'm not getting anywhere really.
At first, I created a function of which I thought it resembled the plot the most:
f <- function(x, a, b) {
a * exp(b *-x)
}
x <- seq(0:100)
y <- f(seq(0:100), 1,1)
qplot(x,y, geom="line")
Afterwards, I tried to generate a fitting model using nls
:
> fit <- nls(y ~ f(x, a, b), data=mydata, start=list(a=1, b=1))
Error in numericDeriv(form[[3]], names(ind), env) :
Missing value or an Infinity produced when evaluating the model
Can someone point me in the right direction on what to do from here?
Follow up
After reading your comments and googling around a bit further I adjusted the starting parameters for a
, b
and c
and then suddenly the model converged.
fit <- nls(y~f(x,a,b,c), data=data.frame(mydata), start=list(a=1, b=30, c=-0.3))
x <- seq(0,120)
fitted.data <- data.frame(x=x, y=predict(fit, list(x=x))
ggplot(mydata, aes(x, y)) + geom_point(color="red", alpha=.5) + geom_line(alpha=.5) + geom_line(data=fitted.data)
Maybe using a cubic specification for your model and estimating via lm
would give you a good fit.
# Importing your data
dataset <- read.table(text='
x y
1 0 123
2 2 116
3 4 113
4 15 100
5 48 87
6 75 84
7 122 77', header=T)
# I think one possible specification would be a cubic linear model
y.hat <- predict(lm(y~x+I(x^2)+I(x^3), data=dataset)) # estimating the model and obtaining the fitted values from the model
qplot(x, y, data=dataset, geom="line") # your plot black lines
last_plot() + geom_line(aes(x=x, y=y.hat), col=2) # the fitted values red lines
# It fits good.
Try taking the log of your response variable and then using lm
to fit a linear model:
fit <- lm(log(y) ~ x, data=mydata)
The adjusted R-squared is 0.8486, which at face value isn't bad. You can look at the fit using plot, for example:
plot(fit, which=2)
But perhaps, it's not such a good fit after all:
last_plot() + geom_line(aes(x=x, y=exp(fit$fitted.values)))
Check this document out: http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf
In brief, first you need to decide on the model to fit onto your data (e.g., exponential) and then estimate its parameters.
Here are some widely used distributions: http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm
来源:https://stackoverflow.com/questions/11844522/fitting-a-function-in-r