How to get a non-shuffled train_test_split in sklearn

后端未结

关注

 3  516

If I want a random train/test split, I use the sklearn helper function:

In [1]: from sklearn.model_selection import train_test_split
   ...: train_test_split


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2021-01-04 00:51
              
            
            
                                                                       
Use numpy.split:

import numpy as np
data = np.array([1,2,3,4,5,6])

np.split(data, [4])           # modify the index here to specify where to split the array
# [array([1, 2, 3, 4]), array([5, 6])]


In case you want to split by a percentage, you can calculate the split index from the shape of data:

data = np.array([1,2,3,4,5,6])
p = 0.6

idx = int(p * data.shape[0]) + 1      # since the percentage may end up to be a fractional 
                                      # number, modify this as you need, usually shouldn't
                                      # affect much if data is large
np.split(data, [idx])
# [array([1, 2, 3, 4]), array([5, 6])]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2021-01-04 01:07
              
            
            
                                                                       
I'm not adding much to Psidom's answer except an easy to copy paste function:

def non_shuffling_train_test_split(X, y, test_size=0.2):
    i = int((1 - test_size) * X.shape[0]) + 1
    X_train, X_test = np.split(X, [i])
    y_train, y_test = np.split(y, [i])
    return X_train, X_test, y_train, y_test


Update:
At some point this feature became built in, so now you can do:

from sklearn.model_selection import train_test_split
train_test_split(X, y, test_size=0.2, shuffle=False)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情书的邮戳        
                
              
                            
                2021-01-04 01:09
              
            
            
                                                                       
All you need to do is to set the shuffle parameter to False and stratify parameter to None:

    In [49]: train_test_split([1,2,3,4,5,6],shuffle = False, stratify = None)
    Out[49]: [[1, 2, 3, 4], [5, 6]]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复