cost function in cv.glm of boot library in R

后端未结

关注

 4  1056

忘掉有多难 2021-02-06 10:27

I am trying to use the crossvalidation cv.glm function from the boot library in R to determine the number of misclassifications when a glm logistic regression is applied.

4条回答

一向 (楼主)

2021-02-06 11:10
I will try to explain the cost function in simple words. Let's take cv.glm(data, glmfit, cost, K) arguments step by step:
1. data The data consists of many observations. Think of it like series of numbers or even.
2. glmfit It is generalized linear model, which runs on the above series. But there is a catch it splits data into several parts equal to K. And runs glmfit on each of them separately (test set), taking the rest of them as training set. The output of glmfit is a series consisting of same number of elements as the split input passed.
3. cost Cost Function. It takes two arguments first the split input series(test set), and second the output of glmfit on the test input. The default is mean square error function. . It sums the square of difference between observed data point and predicted data point. Inside the function a loop runs over the test set (output and input should have same number of elements) calculates difference, squares it and adds to output variable.
4. K The number to which the input should be split. Default gives leave one out cross validation.
Judging from your cost function description. Your input(x) would be a set of numbers between 0 and 1 (0-0.5 = no and 0.5-1 = yes) and output(y) is 'yes' or 'no'. So error(e) between observation(x) and prediction(y) would be :
```
cost<- function(x, y){
  e=0
  for (i in 1:length(x)){
    if(x[i]>0.5)
    {
      if( y[i]=='yes') {e=0}
      else {e=x[i]-0.5}
    }else
    {
      if( y[i]=='no') {e=0}
      else {e=0.5-x[i]}
    }
    e=e*e #square error
  }
  e=e/i #mean square error
  return (e)
}
```
Sources : http://www.cs.cmu.edu/~schneide/tut5/node42.html
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...