Context:
I am using Passive Aggressor from scikit library and confused whether to use warm start or partial fit.
Efforts hitherto
First, let us look at the difference between .fit()
and .partial_fit()
.
.fit()
would let you train from the scratch. Hence, you could think of this as a option that can be used only once for a model. If you call .fit()
again with a new set of data, the model would be build on the new data and will have no influence of previous dataset.
.partial_fit()
would let you update the model with incremental data. Hence, this option can be used more than once for a model. This could be useful, when the whole dataset cannot be loaded into the memory, refer here.
If both .fit()
or .partial_fit()
are going to be used once, then it makes no difference.
warm_start
can be only used in .fit()
, it would let you start the learning from the co-eff of previous fit()
. Now it might sound similar to the purpose to partial_fit()
, but recommended way would be partial_fit()
. May be do the partial_fit()
with same incremental data few number of times, to improve the learning.
I don't know about the Passive Aggressor, but at least when using the SGDRegressor, partial_fit
will only fit for 1 epoch, whereas fit
will fit for multiple epochs (until the loss converges or max_iter
is reached). Therefore, when fitting new data to your model, partial_fit
will only correct the model one step towards the new data, but with fit
and warm_start
it will act as if you would combine your old data and your new data together and fit the model once until convergence.
Example:
from sklearn.linear_model import SGDRegressor
import numpy as np
np.random.seed(0)
X = np.linspace(-1, 1, num=50).reshape(-1, 1)
Y = (X * 1.5 + 2).reshape(50,)
modelFit = SGDRegressor(learning_rate="adaptive", eta0=0.01, random_state=0, verbose=1,
shuffle=True, max_iter=2000, tol=1e-3, warm_start=True)
modelPartialFit = SGDRegressor(learning_rate="adaptive", eta0=0.01, random_state=0, verbose=1,
shuffle=True, max_iter=2000, tol=1e-3, warm_start=False)
# first fit some data
modelFit.fit(X, Y)
modelPartialFit.fit(X, Y)
# for both: Convergence after 50 epochs, Norm: 1.46, NNZs: 1, Bias: 2.000027, T: 2500, Avg. loss: 0.000237
print(modelFit.coef_, modelPartialFit.coef_) # for both: [1.46303288]
# now fit new data (zeros)
newX = X
newY = 0 * Y
# fits only for 1 epoch, Norm: 1.23, NNZs: 1, Bias: 1.208630, T: 50, Avg. loss: 1.595492:
modelPartialFit.partial_fit(newX, newY)
# Convergence after 49 epochs, Norm: 0.04, NNZs: 1, Bias: 0.000077, T: 2450, Avg. loss: 0.000313:
modelFit.fit(newX, newY)
print(modelFit.coef_, modelPartialFit.coef_) # [0.04245779] vs. [1.22919864]
newX = np.reshape([2], (-1, 1))
print(modelFit.predict(newX), modelPartialFit.predict(newX)) # [0.08499296] vs. [3.66702685]
About difference. Warm start it just an attribute of class. Partial fit it is method of this class. It's basically different things.
About same functionalities. Yes, partial fit will use self.coef_ because it still needed to get some values to update on training period. And for empty coef_init we just put zero values to self.coef_ and go to the next step of training.
Description.
For first start: Whatever how (with or without warm start). We will train on zero coefficients but in result we will save average of our coefficients.
N+1 start:
With warm start. We will check via method _allocate_parameter_mem our previous coefficients and take it to train. In result save our average coefficients.
Without warm start. We will put zero coefficients (as first start) and go to training step. In result we will still write average coefficients to memory.
If warm_start = False
, each subsequent call to .fit()
(after an initial call to .fit()
or partial_fit()
) will reset the model's trainable parameters for the initialisation. If warm_start = True
, each subsequent call to .fit()
(after an initial call to .fit()
or partial_fit()
) will retain the values of the model's trainable parameters from the previous run, and use those initially.
Regardless of the value of warm_start
, each call to partial_fit()
will retain the previous run's model parameters and use those initially.
Example using MLPRegressor
:
import sklearn.neural_network
import numpy as np
np.random.seed(0)
x = np.linspace(-1, 1, num=50).reshape(-1, 1)
y = (x * 1.5 + 2).reshape(50,)
cold_model = sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(), warm_start=False, max_iter=1)
warm_model = sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(), warm_start=True, max_iter=1)
cold_model.fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[0.17009494]])] [array([0.74643783])]
cold_model.fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[-0.60819342]])] [array([-1.21256186])]
#after second run of .fit(), values are completely different
#because they were re-initialised before doing the second run for the cold model
warm_model.fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39815616]])] [array([1.651504])]
warm_model.fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39715616]])] [array([1.652504])]
#this time with the warm model, params change relatively little, as params were
#not re-initialised during second call to .fit()
cold_model.partial_fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[-0.60719343]])] [array([-1.21156187])]
cold_model.partial_fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[-0.60619347]])] [array([-1.21056189])]
#with partial_fit(), params barely change even for cold model,
#as no re-initialisation occurs
warm_model.partial_fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39615617]])] [array([1.65350392])]
warm_model.partial_fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39515619]])] [array([1.65450372])]
#and of course the same goes for the warm model