Estimating a probability given other probabilities from a prior

前端 未结 4 2214
粉色の甜心
粉色の甜心 2021-02-10 02:23

I have a bunch of data coming in (calls to an automated callcenter) about whether or not a person buys a particular product, 1 for buy, 0 for not buy.

I want to use this

相关标签:
4条回答
  • 2021-02-10 02:33

    A really simple way of doing this without any difficult math is to increase buyCount and noBuyCount artificially by adding virtual customers that either bought or didn't buy the product. You can tune how much you believe in each particular prior probability in terms of how many virtual customers you think it is worth.

    In pseudocode:

    def estimateProbability(priorProbs, buyCount, noBuyCount, faithInPrior=None):
        if faithInPrior is None: faithInPrior = [10 for x in buyCount]
        adjustedBuyCount = [b + p*f for b,p,f in 
                                    zip(buyCount, priorProbs, faithInPrior]
        adjustedNoBuyCount = [n + (1-p)*f for n,p,f in 
                                    zip(noBuyCount, priorProbs, faithInPrior]
        return [b/(b+n) for b,n in zip(adjustedBuyCount, adjustedNoBuyCount]
    
    0 讨论(0)
  • 2021-02-10 02:35

    As I see it, the best you could do is use the uniform distribution, unless you have some clue regarding the distribution. Or are you talking about making a relationship between this products and products previously bought by the same person in the Amazon Fashion "people who buy this product also buy..." ??

    0 讨论(0)
  • 2021-02-10 02:45

    Sounds like what you're trying to do is Association Rule Learning. I don't have time right now to provide you with any code, but I will point you in the direction of WEKA which is a fantastic open source data mining toolkit for Java. You should find plenty of interesting things there that will help you solve your problem.

    0 讨论(0)
  • 2021-02-10 02:51

    Here's the Bayesian computation and one example/test:

    def estimateProbability(priorProbs, buyCount, noBuyCount):
      # first, estimate the prob that the actual buy/nobuy counts would be observed
      # given each of the priors (times a constant that's the same in each case and
      # not worth the effort of computing;-)`
      condProbs = [p**buyCount * (1.0-p)**noBuyCount for p in priorProbs]
      # the normalization factor for the above-mentioned neglected constant
      # can most easily be computed just once
      normalize = 1.0 / sum(condProbs)
      # so here's the probability for each of the prior (starting from a uniform
      # metaprior)
      priorMeta = [normalize * cp for cp in condProbs]
      # so the result is the sum of prior probs weighed by prior metaprobs
      return sum(pm * pp for pm, pp in zip(priorMeta, priorProbs))
    
    def example(numProspects=4):
      # the a priori prob of buying was either 0.3 or 0.7, how does it change
      # depending on how 4 prospects bought or didn't?
      for bought in range(0, numProspects+1):
        result = estimateProbability([0.3, 0.7], bought, numProspects-bought)
        print 'b=%d, p=%.2f' % (bought, result)
    
    example()
    

    output is:

    b=0, p=0.31
    b=1, p=0.36
    b=2, p=0.50
    b=3, p=0.64
    b=4, p=0.69
    

    which agrees with my by-hand computation for this simple case. Note that the probability of buying, by definition, will always be between the lowest and the highest among the set of priori probabilities; if that's not what you want you might want to introduce a little fudge by introducing two "pseudo-products", one that nobody will ever buy (p=0.0), one that anybody will always buy (p=1.0) -- this gives more weight to actual observations, scarce as they may be, and less to statistics about past products. If we do that here, we get:

    b=0, p=0.06
    b=1, p=0.36
    b=2, p=0.50
    b=3, p=0.64
    b=4, p=0.94
    

    Intermediate levels of fudging (to account for the unlikely but not impossible chance that this new product may be worse than any one ever previously sold, or better than any of them) can easily be envisioned (give lower weight to the artificial 0.0 and 1.0 probabilities, by adding a vector priorWeights to estimateProbability's arguments).

    This kind of thing is a substantial part of what I do all day, now that I work developing applications in Business Intelligence, but I just can't get enough of it...!-)

    0 讨论(0)
提交回复
热议问题