Statsmodels Categorical Data from Formula (using pandas)

后端 未结 1 950
日久生厌
日久生厌 2021-01-16 09:13

I am trying to finish up a homework assignment and to do so I need to use categorical variables in statsmodels (due to a refusal to conform to using stata like everyone else

相关标签:
1条回答
  • 2021-01-16 09:36

    The problem is that C is the name of one of the columns in your DataFrame as well as the patsy way of denoting that you want a categorical variable. The easiest fix would be to just rename the column as such:

    data = data.rename_axis({'C': 'C_data'}, axis=1) form = "C_data ~ Q1 + Q2 + Q3 + Q4 + Q5 + C(BANK)"

    Then the call to sm.ols will just work.

    The error message TypeError: 'Series' object is not callable can be interpreted as follows:

    • patsy interprets C as the column of the data frame. In this case it would the Series data['C']
    • Then the fact that this is followed immediately by parenthesis made statsmodels try to call the data['C'] as a function with the argument BANK. Series objects don't implement a __call__ method, hence the error message that the 'Series' object is not callable.

    Good luck!

    0 讨论(0)
提交回复
热议问题