How to get a non-shuffled train_test_split in sklearn

后端 未结 3 516
野性不改
野性不改 2021-01-04 00:27

If I want a random train/test split, I use the sklearn helper function:

In [1]: from sklearn.model_selection import train_test_split
   ...: train_test_split         


        
相关标签:
3条回答
  • 2021-01-04 00:51

    Use numpy.split:

    import numpy as np
    data = np.array([1,2,3,4,5,6])
    
    np.split(data, [4])           # modify the index here to specify where to split the array
    # [array([1, 2, 3, 4]), array([5, 6])]
    

    In case you want to split by a percentage, you can calculate the split index from the shape of data:

    data = np.array([1,2,3,4,5,6])
    p = 0.6
    
    idx = int(p * data.shape[0]) + 1      # since the percentage may end up to be a fractional 
                                          # number, modify this as you need, usually shouldn't
                                          # affect much if data is large
    np.split(data, [idx])
    # [array([1, 2, 3, 4]), array([5, 6])]
    
    0 讨论(0)
  • 2021-01-04 01:07

    I'm not adding much to Psidom's answer except an easy to copy paste function:

    def non_shuffling_train_test_split(X, y, test_size=0.2):
        i = int((1 - test_size) * X.shape[0]) + 1
        X_train, X_test = np.split(X, [i])
        y_train, y_test = np.split(y, [i])
        return X_train, X_test, y_train, y_test
    

    Update: At some point this feature became built in, so now you can do:

    from sklearn.model_selection import train_test_split
    train_test_split(X, y, test_size=0.2, shuffle=False)
    
    0 讨论(0)
  • 2021-01-04 01:09

    All you need to do is to set the shuffle parameter to False and stratify parameter to None:

        In [49]: train_test_split([1,2,3,4,5,6],shuffle = False, stratify = None)
        Out[49]: [[1, 2, 3, 4], [5, 6]]
    
    0 讨论(0)
提交回复
热议问题