How to split data into 3 sets (train, validation and test)?

后端未结

关注

 7  477

无人及你 2020-11-22 15:03

I have a pandas dataframe and I wish to divide it to 3 separate sets. I know that using train_test_split from sklearn.cross_validation, one can divide the data

7条回答

北海茫月 (楼主)

2020-11-22 15:35
It is very convenient to use train_test_split without performing reindexing after dividing to several sets and not writing some additional code. Best answer above does not mention that by separating two times using train_test_split not changing partition sizes won`t give initially intended partition:
```
x_train, x_remain = train_test_split(x, test_size=(val_size + test_size))
```
Then the portion of validation and test sets in the x_remain change and could be counted as
```
new_test_size = np.around(test_size / (val_size + test_size), 2)
# To preserve (new_test_size + new_val_size) = 1.0 
new_val_size = 1.0 - new_test_size

x_val, x_test = train_test_split(x_remain, test_size=new_test_size)
```
In this occasion all initial partitions are saved.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...