Namespace issues when calling patsy within a function

问题

I am attempting to write a wrapper for the statsmodels formula API (this is a simplified version, the function does more than this):

import statsmodels.formula.api as smf

def wrapper(formula, data, **kwargs):
    return smf.logit(formula, data).fit(**kwargs)

If I give this function to a user, who then attempts to define his/her own function:

def square(x):
    return x**2

model = wrapper('y ~ x + square(x)', data=df)

they will receive a NameError because the patsy module is looking in the namespace of wrapper for the function square. Is there a safe, Pythonic way to handle this situation without knowing a priori what the function names are or how many functions will be needed?

FYI: This is for Python 3.4.3.

回答1:

statsmodels uses the patsy package to parse the formulas and create the design matrix. patsy allows user functions as part of formulas and obtains or evaluates the user function in the user namespace or environment.

as reference see eval_env keyword in http://patsy.readthedocs.org/en/latest/API-reference.html

from_formula is the method of models that implements the formula interface to patsy. It use eval_env to provide the necessary information to patsy, which by default is the calling environment of the user. This can be overwritten by the user with the corresponding keyword argument.

The simplest way to define the eval_env is as an integer that indicates the stacklevel that patsy should use. from_formula is incrementing it to take account of the additional level in the statsmodels methods.

According to the comments, eval_env = 2 will use the next higher level from the level that creates the model, e.g. with model = smf.logit(..., eval_env=2).

This creates the model, calls patsy and creates the design matrix, model.fit() will estimate it and returns the results instance.

回答2:

if you are willing to use eval to do the heavy lifting of your function you can construct a namespace from the arguments to wrapper and the local variables to the outer frame:

wrapper_code = compile("smf.logit(formula, data).fit(**kwargs)",
                       "<WrapperFunction>","eval")
def wrapper(formula,data,**kwargs):
    outer_frame = sys._getframe(1)
    namespace = dict(outer_frame.f_locals)
    namespace.update(formula=formula, data=data, kwargs=kwargs, smf=smf)
    return eval(wrapper_code,namespace)

I don't really see this as a cheat since it seems to be what logit is doing anyway for it to raise a NameError, and as long as wrapper_code is not modified and there are no name conflicts (like using something called data) this should do what you want.

来源：https://stackoverflow.com/questions/36798992/namespace-issues-when-calling-patsy-within-a-function

标签

python

python-3.x

namespaces

statsmodels

patsy