When scale the data, why the train dataset use 'fit' and 'transform', but the test dataset only use 'transform'?

后端 未结 7 1931
悲&欢浪女
悲&欢浪女 2021-02-01 03:32

When scale the data, why the train dataset use \'fit\' and \'transform\', but the test dataset only use \'transform\'?

SAMPLE_COUNT = 5000
TEST_COUNT = 20000
see         


        
7条回答
  •  被撕碎了的回忆
    2021-02-01 04:08

    there could be two approaches: 1st approach scale with fit and transform train data, transform only test data 2nd fit and transform the whole set :train + test

    if you think about: how will the model handle scaling when goes live?: When new data arrives, new data will behave just like the unseen test data in your backtest.

    In the 1st case , new data will will just be scale transformed and your model backtest scaled values remain unchanged.

    But in the 2nd case when new data comes then you will need to fit transform the whole dataset , that means that the backtest scaled values will no longer be the same and then you need to re-train the model..if this task can be done quickly then I guess it is ok but the 1st case requires less work...

    and if there are big differences between scaling in train and test then probably the data is non stationary and ML is probably not a good idea

提交回复
热议问题