Anova test for GLM in python

后端 未结 2 2007
离开以前
离开以前 2021-01-13 20:30

I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM.

formula         


        
相关标签:
2条回答
  • 2021-01-13 21:06

    There isn't, unfortunately. However, you can roll your own by using the model's hypothesis testing methods on each of the terms. In fact, some of their ANOVA methods do not even use the attribute ssr (which is the model's sum of squared residuals, thus obviously undefined for a binomial GLM). You could probably modify this code to do a GLM ANOVA.

    0 讨论(0)
  • 2021-01-13 21:21

    Here is my attempt to roll your own.

    The F-statistic for nested models is defined as:

    (D_s - D_b ) / (addtl_parameters * phi_b)

    Where:

    • D_s is deviance of small model
    • D_b is deviance of larger ("big)" model
    • addtl_parameters is the difference in degrees of freedom between models.
    • phi_b is the estimate of dispersion parameter for the larger model'

    "Statistical theory says that the F-statistic follows an F distribution, with a numerator degrees of freedom equal to the number of added parameters and a denominator degrees of freedom equal to n - p_b, or the number of records minus the number of parameters in the big model."

    We translate this into code with:

    from scipy import stats
    
    def calculate_nested_f_statistic(small_model, big_model):
        """Given two fitted GLMs, the larger of which contains the parameter space of the smaller, return the F Stat and P value corresponding to the larger model adding explanatory power"""
        addtl_params = big_model.df_model - small_model.df_model
        f_stat = (small_model.deviance - big_model.deviance) / (addtl_params * big_model.scale)
        df_numerator = addtl_params
        # use fitted values to obtain n_obs from model object:
        df_denom = (big_model.fittedvalues.shape[0] - big_model.df_model)
        p_value = stats.f.sf(f_stat, df_numerator, df_denom)
        return (f_stat, p_value)
    

    Here is a reproducible example, following the gamma GLM example in statsmodels (https://www.statsmodels.org/stable/glm.html):

    import numpy as np
    import statsmodels.api as sm
    data2 = sm.datasets.scotland.load()
    data2.exog = sm.add_constant(data2.exog, prepend=False)
    
    big_model = sm.GLM(data2.endog, data2.exog, family=sm.families.Gamma()).fit()
    # Drop one covariate (column):
    smaller_model = sm.GLM(data2.endog, data2.exog[:, 1:], family=sm.families.Gamma()).fit()
    
    # Using function defined in answer:
    calculate_nested_f_statistic(smaller_model, big_model)
    # (9.519052917304652, 0.004914748992474178)
    

    Source: https://www.casact.org/pubs/monographs/papers/05-Goldburd-Khare-Tevet.pdf

    0 讨论(0)
提交回复
热议问题