I\'m coming from mainly working in R for statistical modeling / machine learning and looking to improve my skills in Python. I am wondering the best way to create a design m
Being now faced with a similar problem of wanting an easy way of integrating specific interactions from a baseline OLS model from the literature to compare against ML appraches, I came across patsy (http://patsy.readthedocs.io/en/latest/overview.html) and this scikit-learn integration patsylearn (https://github.com/amueller/patsylearn).
Below, how the interaction variables could be passed to the model:
from patsylearn import PatsyModel
model = PatsyModel(sk.linear_model.LinearRegression(), "Play-Tennis ~ C(Outlook):C(Temperature) + C(Outlook):C(Humidity) + C(Outlook):C(Wind)")
Note, that in this formulation you don't need the OneHotEncoder(), as the C in the formula tells the Patsy interpreter that these are categorical variables and they are one-hot encoded for you! But read more about it in their documentation (http://patsy.readthedocs.io/en/latest/categorical-coding.html).
Or, you could also use the PatsyTransformer, which I prefer, as it allows easy integration into scikit-learn Pipelines:
from patsylearn import PatsyTransformer
transformer = PatsyTransformer("C(Outlook):C(Temperature) + C(Outlook):C(Humidity) + C(Outlook):C(Wind)")