How training and test data is split - Keras on Tensorflow

前端 未结 1 1447
青春惊慌失措
青春惊慌失措 2021-02-02 00:40

I am currently training my data using neural network and using fit function.

history=model.fit(X, encoded_Y, batch_size=50, nb_epoch=500, validation_split = 0.2         


        
1条回答
  •  旧时难觅i
    2021-02-02 01:12

    1. The keras documentation says:"The validation data is selected from the last samples in the x and y data provided, before shuffling.", this means that the shuffle occurs after the split, there is also a boolean parameter called "shuffle" which is set true as default, so if you don't want your data to be shuffled you could just set it to false

    2. Getting good results on your training data and then getting bad or not so good results on your evaluation data usually means that your model is overfitting, overfit is when your model learns in a very specific scenario and can't achieve good results on new data

    3. evaluation is to test your model on new data that it has "never seen before", usually you divide your data on training and test, but sometimes you might also want to create a third group of data, because if you just adjust your model to obtain better and better results on your test data this in some way is like cheating because in some way you are telling your model how is the data you are going to use for evaluation and this could cause overfitting

    Also, if you want to split your data without using keras, I recommend you to use the sklearn train_test_split() function.

    it's easy to use and it looks like this:

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
    

    0 讨论(0)
提交回复
热议问题