Estimating a probability given other probabilities from a prior

前端 未结 4 2203
粉色の甜心
粉色の甜心 2021-02-10 02:23

I have a bunch of data coming in (calls to an automated callcenter) about whether or not a person buys a particular product, 1 for buy, 0 for not buy.

I want to use this

4条回答
  •  不知归路
    2021-02-10 02:51

    Here's the Bayesian computation and one example/test:

    def estimateProbability(priorProbs, buyCount, noBuyCount):
      # first, estimate the prob that the actual buy/nobuy counts would be observed
      # given each of the priors (times a constant that's the same in each case and
      # not worth the effort of computing;-)`
      condProbs = [p**buyCount * (1.0-p)**noBuyCount for p in priorProbs]
      # the normalization factor for the above-mentioned neglected constant
      # can most easily be computed just once
      normalize = 1.0 / sum(condProbs)
      # so here's the probability for each of the prior (starting from a uniform
      # metaprior)
      priorMeta = [normalize * cp for cp in condProbs]
      # so the result is the sum of prior probs weighed by prior metaprobs
      return sum(pm * pp for pm, pp in zip(priorMeta, priorProbs))
    
    def example(numProspects=4):
      # the a priori prob of buying was either 0.3 or 0.7, how does it change
      # depending on how 4 prospects bought or didn't?
      for bought in range(0, numProspects+1):
        result = estimateProbability([0.3, 0.7], bought, numProspects-bought)
        print 'b=%d, p=%.2f' % (bought, result)
    
    example()
    

    output is:

    b=0, p=0.31
    b=1, p=0.36
    b=2, p=0.50
    b=3, p=0.64
    b=4, p=0.69
    

    which agrees with my by-hand computation for this simple case. Note that the probability of buying, by definition, will always be between the lowest and the highest among the set of priori probabilities; if that's not what you want you might want to introduce a little fudge by introducing two "pseudo-products", one that nobody will ever buy (p=0.0), one that anybody will always buy (p=1.0) -- this gives more weight to actual observations, scarce as they may be, and less to statistics about past products. If we do that here, we get:

    b=0, p=0.06
    b=1, p=0.36
    b=2, p=0.50
    b=3, p=0.64
    b=4, p=0.94
    

    Intermediate levels of fudging (to account for the unlikely but not impossible chance that this new product may be worse than any one ever previously sold, or better than any of them) can easily be envisioned (give lower weight to the artificial 0.0 and 1.0 probabilities, by adding a vector priorWeights to estimateProbability's arguments).

    This kind of thing is a substantial part of what I do all day, now that I work developing applications in Business Intelligence, but I just can't get enough of it...!-)

提交回复
热议问题