Random_state's contribution to accuracy

后端 未结 1 1022
清酒与你
清酒与你 2021-01-28 04:47

Okay, this is interesting.. I executed the same code a couple of times and each time I got a different accuracy_score. I figured that I was not using any rand

相关标签:
1条回答
  • 2021-01-28 05:28

    Essentially random_state is going to make sure your code outputs the same results each time, by doing the same exact data splits each time. This is mostly helpful for your initial train/test split, and for creating code that others can replicate exactly.

    Splitting the data the same vs. differently

    The first thing to understand is that if you don't use random_state, then the data will be split differently each time, which means that your training set and test sets will be different. This might not make a huge different, but it will result in slight variations in your model parameters / accuracy / etc. If you do set random_state to the same value each time, like random_state=0, then the data will be split the same way each time.

    Each random_state results in a different split

    The second thing to understand is that each random_state value will result in different splits and different behavior. So you need to keep random_state as the same value if you want to be able to replicate results.

    Your model can have multiple random_state pieces

    The third thing to understand is that multiple pieces of your model might have randomness in them. For example, your train_test_split can accept random_state, but so can RandomForestClassifier. So in order to get the exact same results each time, you'll need to set random_state for each piece of your model that has randomness in it.

    Conclusions

    If you're using random_state to do your initial train/test split, you're going to want to set it once and use that split going forward to avoid overfitting to your test set.

    Generally speaking, you can use cross-validation to assess the accuracy of your model and not worry too much about the random_state.

    A very important note is that you should not use random_state to try to improve the accuracy of your model. This is by definition going to result in your model overfitting your data, and not generalizing as well to unseen data.

    0 讨论(0)
提交回复
热议问题