Difference between using train_test_split and cross_val_score in sklearn.cross_validation

后端 未结 2 665
抹茶落季
抹茶落季 2021-02-04 10:40

I have a matrix with 20 columns. The last column are 0/1 labels.

The link to the data is here.

I am trying to run random forest on the dataset, using cross valid

2条回答
  •  心在旅途
    2021-02-04 11:11

    When using cross_val_score, you'll frequently want to use a KFolds or StratifiedKFolds iterator:

    http://scikit-learn.org/0.10/modules/cross_validation.html#computing-cross-validated-metrics

    http://scikit-learn.org/0.10/modules/generated/sklearn.cross_validation.KFold.html#sklearn.cross_validation.KFold

    By default, cross_val_score will not randomize your data, which can produce odd results like this if you're data isn't random to begin with.

    The KFolds iterator has a random state parameter:

    http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html

    So does train_test_split, which does randomize by default:

    http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html

    Patterns like what you described are usually a result of a lack of randomnesss in the train/test set.

提交回复
热议问题