问题
I am trying to estimate the constants for Heaps law.
I have the following dataset novels_colection
:
Number of novels DistinctWords WordOccurrences
1 1 13575 117795
2 1 34224 947652
3 1 40353 1146953
4 1 55392 1661664
5 1 60656 1968274
Then I build the next function:
# Function for Heaps law
heaps <- function(K, n, B){
K*n^B
}
heaps(2,117795,.7) #Just to test it works
So n = Word Occurrences
, and K
and B
are values that should be constants in order to find my prediction of Distinct Words.
I tried this but it gives me an error:
fitHeaps <- nls(DistinctWords ~ heaps(K,WordOccurrences,B),
data = novels_collection[,2:3],
start = list(K = .1, B = .1), trace = T)
Error = Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
Any idea in how could I fix this or a method to fit the function and get the values for K
and B
?
回答1:
If you take log transform on both sides of y = K * n ^ B
, you get log(y) = log(K) + B * log(n)
. This is a linear relationship between log(y)
and log(n)
, hence you can fit a linear regression model to find log(K)
and B
.
logy <- log(DistinctWords)
logn <- log(WordOccurrences)
fit <- lm(logy ~ logn)
para <- coef(fit) ## log(K) and B
para[1] <- exp(para[1]) ## K and B
回答2:
With minpack.lm we can fit a non-linear model but I guess it will be prone to overfitting more than a linear model on the log-transformed variables will do (as done by Zheyuan), but we may compare the residuals of linear / non-linear model on some held-out dataset to get the empirical results, which will be interesting to see.
library(minpack.lm)
fitHeaps = nlsLM(DistinctWords ~ heaps(K, WordOccurrences, B),
data = novels_collection[,2:3],
start = list(K = .01, B = .01))
coef(fitHeaps)
# K B
# 5.0452566 0.6472176
plot(novels_collection$WordOccurrences, novels_collection$DistinctWords, pch=19)
lines(novels_collection$WordOccurrences, predict(fitHeaps, newdata = novels_collection[,2:3]), col='red')
来源:https://stackoverflow.com/questions/39853655/nls-fails-to-estimate-parameters-of-my-model