The learning dataset I\'m using is a grayscale image that was flatten
to have each pixel representing an individual sample. The second image will be classified pixe
TL,DR: make several loops over your data with small learning rate and different order of observations, and your partial_fit
will perform as nice as fit
.
The problem with partial_fit
with many chunks is that when your model completes the last chunk, it may forget the first one. This means, changes in the model weights due to the early batches would be completely overwritten by the late batches.
This problem, however, can be solved easily enough with a combination of:
MLPClassifier
is 0.001, but you can change it by multiples of 3 or 10 and see what happens.Rather than manually providing a rate, you can use adaptive
learning rate functionality provided by sklearn
.
model = SGDClassifier(loss="hinge", penalty="l2", alpha=0.0001, max_iter=3000, tol=None, shuffle=True, verbose=0, learning_rate='adaptive', eta0=0.01, early_stopping=False)
This is described in the [scikit docs] as:
‘adaptive’: eta = eta0, as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5.