How can an interaction design matrix be created from categorical variables?

前端 未结 2 793
[愿得一人]
[愿得一人] 2021-01-20 15:31

I\'m coming from mainly working in R for statistical modeling / machine learning and looking to improve my skills in Python. I am wondering the best way to create a design m

2条回答
  •  鱼传尺愫
    2021-01-20 15:50

    Being now faced with a similar problem of wanting an easy way of integrating specific interactions from a baseline OLS model from the literature to compare against ML appraches, I came across patsy (http://patsy.readthedocs.io/en/latest/overview.html) and this scikit-learn integration patsylearn (https://github.com/amueller/patsylearn).

    Below, how the interaction variables could be passed to the model:

    from patsylearn import PatsyModel
    model = PatsyModel(sk.linear_model.LinearRegression(), "Play-Tennis ~ C(Outlook):C(Temperature) + C(Outlook):C(Humidity) + C(Outlook):C(Wind)")
    

    Note, that in this formulation you don't need the OneHotEncoder(), as the C in the formula tells the Patsy interpreter that these are categorical variables and they are one-hot encoded for you! But read more about it in their documentation (http://patsy.readthedocs.io/en/latest/categorical-coding.html).

    Or, you could also use the PatsyTransformer, which I prefer, as it allows easy integration into scikit-learn Pipelines:

    from patsylearn import PatsyTransformer
    transformer = PatsyTransformer("C(Outlook):C(Temperature) + C(Outlook):C(Humidity) + C(Outlook):C(Wind)")
    

提交回复
热议问题