How to find the best degree of polynomials?

前端 未结 3 639
天涯浪人
天涯浪人 2021-02-01 11:45

I\'m new to Machine Learning and currently got stuck with this. First I use linear regression to fit the training set but get very large RMSE. Then I tried using polynomial regr

3条回答
  •  -上瘾入骨i
    2021-02-01 12:10

    In my opinion, the best way to find an optimal curve fitting degree or in general a fitting model is to use the GridSearchCV module from the scikit-learn library.

    Here is an example how to use this library:

    Firstly let us define a method to sample random data:

    def make_data(N, err=1.0, rseed=1):
    
        rng = np.random.RandomState(rseed)
        X = rng.rand(N, 1) ** 2
        y = 1. / (X.ravel() + 0.3)
        if err > 0:
            y += err * rng.randn(N)
        return X, y
    

    Build a pipeline:

    def PolynomialRegression(degree=2, **kwargs):
        return make_pipeline(PolynomialFeatures(degree), LinearRegression(**kwargs))
    

    Create a data and a vector(X_test) for testing and visualisation purposes:

    X, y = make_data(200)
    X_test = np.linspace(-0.1, 1.1, 200)[:, None]
    

    Define the GridSearchCV parameters:

    param_grid = {'polynomialfeatures__degree': np.arange(20),
    'linearregression__fit_intercept': [True, False],
    'linearregression__normalize': [True, False]}
    grid = GridSearchCV(PolynomialRegression(), param_grid, cv=7)
    grid.fit(X, y)
    

    Get the best parameters from our model:

    model = grid.best_estimator_
    model
    
    Pipeline(memory=None,
         steps=[('polynomialfeatures', PolynomialFeatures(degree=4, include_bias=True, interaction_only=False)), ('linearregression', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False))])
    

    Fit the model with the X and y data and use the vector to predict the values:

    y_test = model.fit(X, y).predict(X_test)
    

    Visualize the result:

    plt.scatter(X, y)
    plt.plot(X_test.ravel(), y_test, 'r')
    

    The best fit result

    The full code snippet:

    from sklearn.preprocessing import PolynomialFeatures
    from sklearn.linear_model import LinearRegression
    from sklearn.pipeline import make_pipeline
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.model_selection import GridSearchCV
    
    def make_data(N, err=1.0, rseed=1):
    
        rng = np.random.RandomState(rseed)
        X = rng.rand(N, 1) ** 2
        y = 1. / (X.ravel() + 0.3)
        if err > 0:
            y += err * rng.randn(N)
        return X, y
    
    def PolynomialRegression(degree=2, **kwargs):
        return make_pipeline(PolynomialFeatures(degree), LinearRegression(**kwargs))
    
    
    X, y = make_data(200)
    X_test = np.linspace(-0.1, 1.1, 200)[:, None]
    
    param_grid = {'polynomialfeatures__degree': np.arange(20),
    'linearregression__fit_intercept': [True, False],
    'linearregression__normalize': [True, False]}
    grid = GridSearchCV(PolynomialRegression(), param_grid, cv=7)
    grid.fit(X, y)
    
    model = grid.best_estimator_
    
    y_test = model.fit(X, y).predict(X_test)
    
    plt.scatter(X, y)
    plt.plot(X_test.ravel(), y_test, 'r')
    

提交回复
热议问题