问题
I am trying to use the statsmodel linear regression functions with formulas. My sample data is coming from a Pandas data frame. I am having a slight problem with column names within the formula. Due to the downstream processes, I have hyphens within my column names. For example:
+------+-------+-------+
+ VOLT + B-NN + B-IDW +
+------+-------+-------+
Now, one of the reasons for keeping the hyphen as it allows python to split the string for other analysis, so I have to keep it. As you can see, when I want to regress VOLT with B-NN using VOLT ~ B-NN
, I encounter a problem as the patsy formula cannot find B.
Is there a way to tell Patsy that B-NN is a variable name and not B minus NN?
Thanks.
BJR
回答1:
patsy uses Q
for quoting names, e.g. Q('B-IDW')
http://patsy.readthedocs.io/en/latest/builtins-reference.html#patsy.builtins.Q
my_fit_function("y ~ Q('weight.in.kg')", ...)
来源:https://stackoverflow.com/questions/50623216/patsy-formula-when-variable-has-a-hypthen