问题

I'm using an example of linear regression from bayesian methods for hackers but having trouble expanding it to my usage.

I have observations on a random variable, an assumed distribution on that random variable, and finally another assumed distribution on that random variable for which I have observations. How I have tried to model it is with intermediate distributions on a and b, but it complains Wrong number of dimensions: expected 0, got 1 with shape (788,).

To describe the actual model, I am predicting the conversion rate for a certain amount (n) of cultivating emails. My prior is that the conversion rate (described by a Beta function on alpha and beta) will be updated by having alpha and beta scaled by some factors (0,inf] a and b, which start at 1 for n=0 and increase to their max value at some threshold.

# Generate predictive data, X and target data, Y
data = [
{'n': 0 , 'trials': 120, 'successes': 1},
{'n': 5 , 'trials': 111, 'successes': 2},
{'n': 10, 'trials': 78 , 'successes': 1},
{'n': 15, 'trials': 144, 'successes': 3},
{'n': 20, 'trials': 280, 'successes': 7},
{'n': 25, 'trials': 55 , 'successes': 1}]

X = np.empty(0)
Y = np.empty(0)
for dat in data:
    X = np.insert(X, 0, np.ones(dat['trials']) * dat['n'])
    target = np.zeros(dat['trials'])
    target[:dat['successes']] = 1
    Y = np.insert(Y, 0, target)

with pm.Model() as model:
    alpha = pm.Uniform("alpha_n", 5, 13)
    beta = pm.Uniform("beta_n", 1000, 1400)
    n_sat = pm.Gamma("n_sat", alpha=20, beta=2, testval=10)
    a_gamma = pm.Gamma("a_gamma", alpha=18, beta=15)
    b_gamma = pm.Gamma("b_gamma", alpha=18, beta=27)
    a_slope = pm.Deterministic('a_slope', 1 + (X/n_sat)*(a_gamma-1))
    b_slope = pm.Deterministic('b_slope', 1 + (X/n_sat)*(b_gamma-1))
    a = pm.math.switch(X >= n_sat, a_gamma, a_slope)
    b = pm.math.switch(X >= n_sat, b_gamma, b_slope)
    p = pm.Beta("p", alpha=alpha*a, beta=beta*b)
    observed = pm.Bernoulli("observed", p, observed=Y)

Is there a way to get this to work?

回答1:

Data

First, note that the total likelihood of repeated Bernoulli trials is exactly a binomial likelihood, so there is no need to expand to individual trials in your data. I'd also suggest using a Pandas DataFrame to manage your data - it's helps to keep things tidy:

import pandas as pd

df = pd.DataFrame({
    'n': [0, 5, 10, 15, 20, 25],
    'trials': [120, 111, 78, 144, 280, 55],
    'successes': [1, 2, 1, 3, 7, 1]
})

Solution

This will help simplify the model, but the solution really is to add a shape argument to the p random variable so that PyMC3 knows to how to interpret the one dimensional parameters. The fact is that you do want a different p distribution for each n case you have, so there is nothing conceptually wrong here.

with pm.Model() as model:
    # conversion rate hyperparameters
    alpha = pm.Uniform("alpha_n", 5, 13)
    beta = pm.Uniform("beta_n", 1000, 1400)

    # switchpoint prior
    n_sat = pm.Gamma("n_sat", alpha=20, beta=2, testval=10)

    a_gamma = pm.Gamma("a_gamma", alpha=18, beta=15)
    b_gamma = pm.Gamma("b_gamma", alpha=18, beta=27)

    # NB: I removed pm.Deterministic b/c (a|b)_slope[0] is constant 
    #     and this causes issues when using ArViZ
    a_slope = 1 + (df.n.values/n_sat)*(a_gamma-1)
    b_slope = 1 + (df.n.values/n_sat)*(b_gamma-1)

    a = pm.math.switch(df.n.values >= n_sat, a_gamma, a_slope)
    b = pm.math.switch(df.n.values >= n_sat, b_gamma, b_slope)

    # conversion rates
    p = pm.Beta("p", alpha=alpha*a, beta=beta*b, shape=len(df.n))

    # observations
    pm.Binomial("observed", n=df.trials, p=p, observed=df.successes)

    trace = pm.sample(5000, tune=10000)

This samples nicely

enter image description here

and yields reasonable intervals on the conversion rates

enter image description here

but the fact that the posteriors for alpha_n and beta_n go right up to your prior boundaries is a bit concerning:

enter image description here

I think the reason for this is that, for each condition you only do 55-280 trials, which, if the conditions were independent (worst case), conjugacy would tells us that your Beta hyperparameters should be in that range. Since you are doing a regression, then the best case scenario for information sharing across the trials would put your hyperparameters in the range of the sum of trials (788) - but that's an upper limit. Because you're outside this range, the concern here is that you're forcing the model to be more precise in its estimates than you really have the evidence to support. However, one can justify this is if the prior is based on strong independent evidence.

Otherwise, I'd suggest expanding the ranges on those priors that affect the final alpha*a and beta*b numbers (the sums of those should be close to your trial counts in the posterior).

Alternative Model

I'd probably do something along the following lines, which I think has a more transparent parameterization, though it's not completely identical to your model:

with pm.Model() as model_br_sp:
    # regression coefficients
    alpha = pm.Normal("alpha", mu=0, sd=1)
    beta = pm.Normal("beta", mu=0, sd=1)

    # saturation parameters
    saturation_point = pm.Gamma("saturation_point", alpha=20, beta=2)
    max_success_rate = pm.Beta("max_success_rate", 1, 9)

    # probability of conversion
    success_rate = pm.Deterministic("success_rate", 
                                    pm.math.switch(df.n.values > saturation_point, 
                                                   max_success_rate,
                                                   max_success_rate*pm.math.sigmoid(alpha + beta*df.n)))

    # observations
    pm.Binomial("successes", n=df.trials, p=success_rate, observed=df.successes)

    trace_br_sp = pm.sample(draws=5000, tune=10000)

Here we map the predictor space to probability space through a sigmoid that maxes out at the maximum success rate. The prior on the saturation point is identical to yours, while that on the maximum success rate is weakly informative (Beta[1,9] - though I will say it runs on a flat prior nearly as well). This also samples well,

enter image description here

and gives similar intervals (though the switchpoint seems to dominate more):

enter image description here

We can compare the two models and see that there isn't a significant difference in their explanatory power:

import arviz as az

model_compare = az.compare({'Binomial Regression w/ Switchpoint': trace_br_sp,
                            'Original Model': trace})
az.plot_compare(model_compare)

enter image description here