patsy | 易学教程

How do I get the columns that a statsmodels / patsy formula depends on?

阅读更多关于 How do I get the columns that a statsmodels / patsy formula depends on?

问题 Suppose I have a pandas dataframe: df = pd.DataFrame({'x1': [0, 1, 2, 3, 4], 'x2': [10, 9, 8, 7, 6], 'x3': [.1, .1, .2, 4, 8], 'y': [17, 18, 19, 20, 21]}) Now I fit a statsmodels model using a formula (which uses patsy under the hood): import statsmodels.formula.api as smf fit = smf.ols(formula='y ~ x1:x2', data=df).fit() What I want is a list of the columns of df that fit depends on, so that I can use fit.predict() on another dataset. If I try list(fit.params.index) , for example, I get: [

Creating dummy variable using pandas or statsmodel for interaction of two columns

阅读更多关于 Creating dummy variable using pandas or statsmodel for interaction of two columns

问题 I have a data frame like this: Index ID Industry years_spend asset 6646 892 4 4 144.977037 2347 315 10 8 137.749138 7342 985 1 5 104.310217 137 18 5 5 156.593396 2840 381 11 2 229.538828 6579 883 11 1 171.380125 1776 235 4 7 217.734377 2691 361 1 2 148.865341 815 110 15 4 233.309491 2932 393 17 5 187.281724 I want to create dummy variables for Industry X years_spend which creates len(df.Industry.value_counts()) * len(df.years_spend.value_counts()) varaible, for example d_11_4 = 1 for all rows

Namespace issues when calling patsy within a function

阅读更多关于 Namespace issues when calling patsy within a function

问题 I am attempting to write a wrapper for the statsmodels formula API (this is a simplified version, the function does more than this): import statsmodels.formula.api as smf def wrapper(formula, data, **kwargs): return smf.logit(formula, data).fit(**kwargs) If I give this function to a user, who then attempts to define his/her own function: def square(x): return x**2 model = wrapper('y ~ x + square(x)', data=df) they will receive a NameError because the patsy module is looking in the namespace

as_formula specifier for sklearn.tree.decisiontreeclassifier in Python?

阅读更多关于 as_formula specifier for sklearn.tree.decisiontreeclassifier in Python?

问题 I was curious if there is an as_formula specifier (like in statsmodels ) for sklearn.tree.decisiontreeclassifier in Python, or some way to hack one in. Currently, I must use clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) but I would prefer to have something like clf = clf.fit(formula='Y ~ X', data=df) The reason is that I would like to specify more than one X without having to do a lot of array shaping. Thanks. 回答1: It's currently not possible, but it would be great to have a patsy

Creating dummy variable using pandas or statsmodel for interaction of two columns

阅读更多关于 Creating dummy variable using pandas or statsmodel for interaction of two columns

I have a data frame like this: Index ID Industry years_spend asset 6646 892 4 4 144.977037 2347 315 10 8 137.749138 7342 985 1 5 104.310217 137 18 5 5 156.593396 2840 381 11 2 229.538828 6579 883 11 1 171.380125 1776 235 4 7 217.734377 2691 361 1 2 148.865341 815 110 15 4 233.309491 2932 393 17 5 187.281724 I want to create dummy variables for Industry X years_spend which creates len(df.Industry.value_counts()) * len(df.years_spend.value_counts()) varaible, for example d_11_4 = 1 for all rows that has industry==1 and years spend=4 otherwise d_11_4 = 0. Then I can use these vars for some

Reciprocals in patsy

阅读更多关于 Reciprocals in patsy

问题 Patsy's power doesn't allow for negative integers, so, if we have some series data X , patsy.dmatrices('X + X**(-1)', X) returns an error. How would I add the reciprocal of X to such a patsy formula? 回答1: The special patsy meaning of operators gets switched off inside embedded function calls; so if you write X + 1 / x then patsy interprets that as the special patsy + and / operators, but if you write something like X + sin(1 / X) , then patsy continues to interpret the + as a special patsy

Reciprocals in patsy

阅读更多关于 Reciprocals in patsy

Patsy's power doesn't allow for negative integers, so, if we have some series data X , patsy.dmatrices('X + X**(-1)', X) returns an error. How would I add the reciprocal of X to such a patsy formula? The special patsy meaning of operators gets switched off inside embedded function calls; so if you write X + 1 / x then patsy interprets that as the special patsy + and / operators, but if you write something like X + sin(1 / X) , then patsy continues to interpret the + as a special patsy operator, but the whole sin(1 / X) expression gets passed to Python to evaluate, and Python will evaluate the