How to solve several independent time series at the same time using scikit linear regression model

前端 未结 2 918
日久生厌
日久生厌 2021-01-03 07:14

I try to predict multiple independent time series simultaneously using sklearn linear regression model, but I seem not be able to get it right.

My data is organised

相关标签:
2条回答
  • 2021-01-03 07:31

    @ali_m I don't think this is a duplicate question, but they are partly related. And of course it's possible to apply and predict time series simultaneously using a linear regression model similar to sklearn:

    I created a new class LinearRegression_Multi:

    class LinearRegression_Multi:
        def stacked_lstsq(self, L, b, rcond=1e-10):
            """
            Solve L x = b, via SVD least squares cutting of small singular values
            L is an array of shape (..., M, N) and b of shape (..., M).
            Returns x of shape (..., N)
            """
            u, s, v = np.linalg.svd(L, full_matrices=False)
            s_max = s.max(axis=-1, keepdims=True)
            s_min = rcond*s_max
            inv_s = np.zeros_like(s)
            inv_s[s >= s_min] = 1/s[s>=s_min]
            x = np.einsum('...ji,...j->...i', v,
                          inv_s * np.einsum('...ji,...j->...i', u, b.conj()))
            return np.conj(x, x)    
    
        def center_data(self, X, y):
            """ Centers data to have mean zero along axis 0. 
            """
            # center X        
            X_mean = np.average(X,axis=1)
            X_std = np.ones(X.shape[0::2])
            X = X - X_mean[:,None,:] 
            # center y
            y_mean = np.average(y,axis=1)
            y = y - y_mean[:,None]
            return X, y, X_mean, y_mean, X_std
    
        def set_intercept(self, X_mean, y_mean, X_std):
            """ Calculate the intercept_
            """
            self.coef_ = self.coef_ / X_std # not really necessary
            self.intercept_ = y_mean - np.einsum('ij,ij->i',X_mean,self.coef_)
    
        def scores(self, y_pred, y_true ):
            """ 
            The coefficient R^2 is defined as (1 - u/v), where u is the regression
            sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual
            sum of squares ((y_true - y_true.mean()) ** 2).sum().        
            """        
            u = ((y_true - y_pred) ** 2).sum(axis=-1)
            v = ((y_true - y_true.mean(axis=-1)[None].T) ** 2).sum(axis=-1)
            r_2 = 1 - u/v
            return r_2
    
        def fit(self,X, y):
            """ Fit linear model.        
            """        
            # get coefficients by applying linear regression on stack
            X_, y, X_mean, y_mean, X_std = self.center_data(X, y)
            self.coef_ = self.stacked_lstsq(X_, y)
            self.set_intercept(X_mean, y_mean, X_std)
    
        def predict(self, X):
            """Predict using the linear model
            """
            return np.einsum('ijx,ix->ij',X,self.coef_) + self.intercept_[None].T
    

    Which can be applied as follow, using the same declared variables as in the question:

    LR_Multi = LinearRegression_Multi()
    LR_Multi.fit(X_stack[:,:half], y_stack[:,:half])
    y_stack_pred = LR_Multi.predict(X_stack[:,half:])
    R2 = LR_Multi.scores(y_stack_pred, y_stack[:,half:])
    

    Where the R^2 for the multiple time series are:

    array([ 0.91262442,  0.67247516])
    

    Which is indeed similar to the prediction method of the standard sklearn linear regression:

    from sklearn.linear_model import LinearRegression
    
    LR = LinearRegression()
    LR.fit(X1[:half], y1[:half])
    R2_1 = LR.score(X1[half:],y1[half:])
    
    LR.fit(X2[:half], y2[:half])
    R2_2 = LR.score(X2[half:],y2[half:])
    print R2_1, R2_2
    0.912624422097 0.67247516054
    
    0 讨论(0)
  • 2021-01-03 07:49

    If you need to build separate models, there is no possibility to use the power of numpy for getting performance improvement of the fact you have many different tasks. The only thing you can do is to run them simultaneously in different threads (by using multi cores of you CPU) or even split calculations to different computers.

    If you believe all the data fit the same model, then the obvious solution is just to merge all the Xn and yn and learn on them. This will definitely be faster then calculating separate models.

    But in fact the question is not in the calculations performance but in the result you want to get. If you need different models you have no options, just calculate them separately. If you need one model, just merge the data. Otherwise, if you would calculate separate models you'll get the problem: how to get the final parameters from all that models.

    0 讨论(0)
提交回复
热议问题