When scale the data, why the train dataset use 'fit' and 'transform', but the test dataset only use 'transform'?

后端 未结 7 1938
悲&欢浪女
悲&欢浪女 2021-02-01 03:32

When scale the data, why the train dataset use \'fit\' and \'transform\', but the test dataset only use \'transform\'?

SAMPLE_COUNT = 5000
TEST_COUNT = 20000
see         


        
7条回答
  •  北荒
    北荒 (楼主)
    2021-02-01 04:29

    fit() and transform() are the two methods used to generally account for the missing values in the dataset.The missing values can be filled either by computing the mean or the median of the data and filling that empty places with that mean or median. fit() is used to calculate the mean or the median. transform() is used to fill in missing values with the calculated mean or the median. fit_tranform() performs the above 2 tasks in a single stretch. fit_transform() is used for the training data to perform the above.When it comes to validation set only transform() is required since you dont want to change the way you handle missing values when it comes to the validation set, because by doing so you may take your model by surprise!! and hence it may fail to perform as expected.

提交回复
热议问题