How to split data into 3 sets (train, validation and test)?

后端 未结 7 477
无人及你
无人及你 2020-11-22 15:03

I have a pandas dataframe and I wish to divide it to 3 separate sets. I know that using train_test_split from sklearn.cross_validation, one can divide the data

7条回答
  •  北海茫月
    2020-11-22 15:35

    It is very convenient to use train_test_split without performing reindexing after dividing to several sets and not writing some additional code. Best answer above does not mention that by separating two times using train_test_split not changing partition sizes won`t give initially intended partition:

    x_train, x_remain = train_test_split(x, test_size=(val_size + test_size))
    

    Then the portion of validation and test sets in the x_remain change and could be counted as

    new_test_size = np.around(test_size / (val_size + test_size), 2)
    # To preserve (new_test_size + new_val_size) = 1.0 
    new_val_size = 1.0 - new_test_size
    
    x_val, x_test = train_test_split(x_remain, test_size=new_test_size)
    

    In this occasion all initial partitions are saved.

提交回复
热议问题