How can an interaction design matrix be created from categorical variables?

前端未结

关注

 2  793

[愿得一人] 2021-01-20 15:31

I\'m coming from mainly working in R for statistical modeling / machine learning and looking to improve my skills in Python. I am wondering the best way to create a design m

2条回答

鱼传尺愫 (楼主)

2021-01-20 15:50
Being now faced with a similar problem of wanting an easy way of integrating specific interactions from a baseline OLS model from the literature to compare against ML appraches, I came across patsy (http://patsy.readthedocs.io/en/latest/overview.html) and this scikit-learn integration patsylearn (https://github.com/amueller/patsylearn).

Below, how the interaction variables could be passed to the model:
```
from patsylearn import PatsyModel
model = PatsyModel(sk.linear_model.LinearRegression(), "Play-Tennis ~ C(Outlook):C(Temperature) + C(Outlook):C(Humidity) + C(Outlook):C(Wind)")
```
Note, that in this formulation you don't need the OneHotEncoder(), as the C in the formula tells the Patsy interpreter that these are categorical variables and they are one-hot encoded for you! But read more about it in their documentation (http://patsy.readthedocs.io/en/latest/categorical-coding.html).

Or, you could also use the PatsyTransformer, which I prefer, as it allows easy integration into scikit-learn Pipelines:
```
from patsylearn import PatsyTransformer
transformer = PatsyTransformer("C(Outlook):C(Temperature) + C(Outlook):C(Humidity) + C(Outlook):C(Wind)")
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...