Random_state's contribution to accuracy

后端未结

关注

 1  1022

Okay, this is interesting.. I executed the same code a couple of times and each time I got a different accuracy_score. I figured that I was not using any rand


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  遇见更好的自我        
                
              
                            
                2021-01-28 05:28
              
            
            
                                                                       
Essentially random_state is going to make sure your code outputs the same results each time, by doing the same exact data splits each time. This is mostly helpful for your initial train/test split, and for creating code that others can replicate exactly.
Splitting the data the same vs. differently
The first thing to understand is that if you don't use random_state, then the data will be split differently each time, which means that your training set and test sets will be different. This might not make a huge different, but it will result in slight variations in your model parameters / accuracy / etc. If you do set random_state to the same value each time, like random_state=0, then the data will be split the same way each time.
Each random_state results in a different split
The second thing to understand is that each random_state value will result in different splits and different behavior. So you need to keep random_state as the same value if you want to be able to replicate results.
Your model can have multiple random_state pieces
The third thing to understand is that multiple pieces of your model might have randomness in them. For example, your train_test_split can accept random_state, but so can RandomForestClassifier. So in order to get the exact same results each time, you'll need to set random_state for each piece of your model that has randomness in it.
Conclusions
If you're using random_state to do your initial train/test split, you're going to want to set it once and use that split going forward to avoid overfitting to your test set.
Generally speaking, you can use cross-validation to assess the accuracy of your model and not worry too much about the random_state.
A very important note is that you should not use random_state to try to improve the accuracy of your model. This is by definition going to result in your model overfitting your data, and not generalizing as well to unseen data.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复