How to apply OLS from statsmodels to groupby

前端 未结 2 1673
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-16 10:31

I am running OLS on products by month. While this works fine for a single product, my dataframe contains many products. If I create a groupby object OLS gives an error.

相关标签:
2条回答
  • 2021-01-16 10:41

    Use get_group to get each individual group and perform OLS model on each one:

    for group in linear_regression_grouped.groups.keys():
        df= linear_regression_grouped.get_group(group)
        X = df['period_num'] 
        y = df['TOTALS']
        model = sm.OLS(y, X)
        results = model.fit()
        print results.summary()
    

    But in real case, you also want to have the intercept term so the model should be defined slightly differently:

    for group in linear_regression_grouped.groups.keys():
        df= linear_regression_grouped.get_group(group)
        df['constant']=1
        X = df[['period_num','constant']]
        y = df['TOTALS']
        model = sm.OLS(y,X)
        results = model.fit()
        print results.summary()
    

    The results (with intercept and without) are, certainly, very different.

    0 讨论(0)
  • 2021-01-16 10:42

    You could do something like this ...

    import pandas as pd
    import statsmodels.api as sm
    
    for products in linear_regression_df.product_desc.unique():
        tempdf = linear_regression_df[linear_regression_df.product_desc == products]
        X = tempdf['period_num']
        y = tempdf['TOTALS']
    
        model = sm.OLS(y, X)
        results = model.fit()
    
        print results.params #  Or whatever summary info you want
    
    0 讨论(0)
提交回复
热议问题