Python 2.7 - statsmodels - formatting and writing summary output

后端 未结 5 1716
独厮守ぢ
独厮守ぢ 2021-02-05 20:55

I\'m doing logistic regression using pandas 0.11.0(data handling) and statsmodels 0.4.3 to do the actual regression, on Mac OSX Lion.

I\'m goin

相关标签:
5条回答
  • 2021-02-05 21:28
    write_path = '/my/path/here/output.csv'
    with open(write_path, 'w') as f:
        f.write(result.summary().as_csv())
    
    0 讨论(0)
  • 2021-02-05 21:29

    There is actually a built-in method documented in the documentation here:

    f = open('csvfile.csv','w')
    f.write(result.summary().as_csv())
    f.close()
    

    I believe this is a much easier (and clean) way to output the summaries to csv files.

    0 讨论(0)
  • 2021-02-05 21:45

    There is no premade table of parameters and their result statistics currently available.

    Essentially you need to stack all the results yourself, whether in a list, numpy array or pandas DataFrame depends on what's more convenient for you.

    for example, if I want one numpy array that has the results for a model, llf and results in the summary parameter table, then I could use

    res_all = []
    for res in results:
        low, upp = res.confint().T   # unpack columns 
        res_all.append(numpy.concatenate(([res.llf], res.params, res.tvalues, res.pvalues, 
                       low, upp)))
    

    But it might be better to align with pandas, depending on what structure you have across models.

    You could write a helper function that takes all the results from the results instance and concatenates them in a row.

    (I'm not sure what's the most convenient for writing to csv by rows)

    edit:

    Here is an example storing the regression results in a dataframe

    https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/multilinear.py#L21

    the loop is on line 159.

    summary() and similar code outside of statsmodels, for example http://johnbeieler.org/py_apsrtable/ for combining several results, is oriented towards printing and not to store variables.

    0 讨论(0)
  • 2021-02-05 21:47
    • results.params : for coefficient
    • results.pvalues : for p-values

    BTW you can use dir(results) to find out all the attribute of an object

    0 讨论(0)
  • 2021-02-05 21:52

    I found this formulation to be a little more straightforward. You can add/subtract columns by following the syntax from the examples (pvals,coeff,conf_lower,conf_higher).

    import pandas as pd     #This can be left out if already present...
    
    def results_summary_to_dataframe(results):
        '''This takes the result of an statsmodel results table and transforms it into a dataframe'''
        pvals = results.pvalues
        coeff = results.params
        conf_lower = results.conf_int()[0]
        conf_higher = results.conf_int()[1]
    
        results_df = pd.DataFrame({"pvals":pvals,
                                   "coeff":coeff,
                                   "conf_lower":conf_lower,
                                   "conf_higher":conf_higher
                                    })
    
        #Reordering...
        results_df = results_df[["coeff","pvals","conf_lower","conf_higher"]]
        return results_df
    
    0 讨论(0)
提交回复
热议问题