`nls` fails to estimate parameters of my model

前端 未结 2 1010
臣服心动
臣服心动 2021-01-24 00:03

I am trying to estimate the constants for Heaps law. I have the following dataset novels_colection:

  Number of novels DistinctWords WordOccurrences         


        
相关标签:
2条回答
  • 2021-01-24 00:51

    If you take log transform on both sides of y = K * n ^ B, you get log(y) = log(K) + B * log(n). This is a linear relationship between log(y) and log(n), hence you can fit a linear regression model to find log(K) and B.

    logy <- log(DistinctWords)
    logn <- log(WordOccurrences)
    
    fit <- lm(logy ~ logn)
    
    para <- coef(fit)  ## log(K) and B
    para[1] <- exp(para[1])    ## K and B
    
    0 讨论(0)
  • 2021-01-24 00:53

    With minpack.lm we can fit a non-linear model but I guess it will be prone to overfitting more than a linear model on the log-transformed variables will do (as done by Zheyuan), but we may compare the residuals of linear / non-linear model on some held-out dataset to get the empirical results, which will be interesting to see.

    library(minpack.lm)
    fitHeaps = nlsLM(DistinctWords ~ heaps(K, WordOccurrences, B),
                         data = novels_collection[,2:3], 
                         start = list(K = .01, B = .01))
    coef(fitHeaps)
    #        K         B 
    # 5.0452566 0.6472176 
    
    plot(novels_collection$WordOccurrences, novels_collection$DistinctWords, pch=19)
    lines(novels_collection$WordOccurrences, predict(fitHeaps, newdata = novels_collection[,2:3]), col='red')
    

    0 讨论(0)
提交回复
热议问题