问题
I am using gbm
package in R
and applying the 'bernoulli' option for distribution to build a classifier and i get unusual results of 'nan' and i'm unable to predict any classification results. But i do not encounter the same errors when i use 'adaboost'. Below is the sample code, i replicated the same errors with the iris dataset.
## using the iris data for gbm
library(caret)
library(gbm)
data(iris)
Data <- iris[1:100,-5]
Label <- as.factor(c(rep(0,50), rep(1,50)))
# Split the data into training and testing
inTraining <- createDataPartition(Label, p=0.7, list=FALSE)
training <- Data[inTraining, ]
trainLab <- droplevels(Label[inTraining])
testing <- Data[-inTraining, ]
testLab <- droplevels(Label[-inTraining])
# Model
model_gbm <- gbm.fit(x=training, y= trainLab,
distribution = "bernoulli",
n.trees = 20, interaction.depth = 1,
n.minobsinnode = 10, shrinkage = 0.001,
bag.fraction = 0.5, keep.data = TRUE, verbose = TRUE)
## output on the console
Iter TrainDeviance ValidDeviance StepSize Improve
1 -nan -nan 0.0010 -nan
2 nan -nan 0.0010 nan
3 -nan -nan 0.0010 -nan
4 nan -nan 0.0010 nan
5 -nan -nan 0.0010 -nan
6 nan -nan 0.0010 nan
7 -nan -nan 0.0010 -nan
8 nan -nan 0.0010 nan
9 -nan -nan 0.0010 -nan
10 nan -nan 0.0010 nan
20 nan -nan 0.0010 nan
Please let me know if there is a work around to get this working. The reason i am using this is to experiment with Additive Logistic Regression, please suggest if there are any other alternatives in R to go about this.
Thanks.
回答1:
Is there a reason you are using gbm.fit()
instead of gbm()
?
Based on the package documentation, the y variable in gbm.fit()
needs to be a vector.
I tried making sure the vector was forced using
trainLab <- as.vector(droplevels(Label[inTraining])) #vector of chars
Which gave the following output on the console. Unfortunately I'm not sure why the valid deviance is still -nan.
Iter TrainDeviance ValidDeviance StepSize Improve
1 1.3843 -nan 0.0010 0.0010
2 1.3823 -nan 0.0010 0.0010
3 1.3803 -nan 0.0010 0.0010
4 1.3783 -nan 0.0010 0.0010
5 1.3763 -nan 0.0010 0.0010
6 1.3744 -nan 0.0010 0.0010
7 1.3724 -nan 0.0010 0.0010
8 1.3704 -nan 0.0010 0.0010
9 1.3684 -nan 0.0010 0.0010
10 1.3665 -nan 0.0010 0.0010
20 1.3471 -nan 0.0010 0.0010
回答2:
train.fraction should be <1 to get ValidDeviance, because this way we are creating a validation dataset.
Thanks!
来源:https://stackoverflow.com/questions/23530165/gradient-boosting-using-gbm-in-r-with-distribution-bernoulli