Estimating a probability given other probabilities from a prior

前端未结

关注

 4  2214

I have a bunch of data coming in (calls to an automated callcenter) about whether or not a person buys a particular product, 1 for buy, 0 for not buy.

I want to use this

相关标签:

4条回答

野性不改

2021-02-10 02:33

A really simple way of doing this without any difficult math is to increase buyCount and noBuyCount artificially by adding virtual customers that either bought or didn't buy the product. You can tune how much you believe in each particular prior probability in terms of how many virtual customers you think it is worth.

In pseudocode:

def estimateProbability(priorProbs, buyCount, noBuyCount, faithInPrior=None):
    if faithInPrior is None: faithInPrior = [10 for x in buyCount]
    adjustedBuyCount = [b + p*f for b,p,f in 
                                zip(buyCount, priorProbs, faithInPrior]
    adjustedNoBuyCount = [n + (1-p)*f for n,p,f in 
                                zip(noBuyCount, priorProbs, faithInPrior]
    return [b/(b+n) for b,n in zip(adjustedBuyCount, adjustedNoBuyCount]

0 讨论(0)

迷失自我

2021-02-10 02:35

As I see it, the best you could do is use the uniform distribution, unless you have some clue regarding the distribution. Or are you talking about making a relationship between this products and products previously bought by the same person in the Amazon Fashion "people who buy this product also buy..." ??

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2021-02-10 02:45

Sounds like what you're trying to do is Association Rule Learning. I don't have time right now to provide you with any code, but I will point you in the direction of WEKA which is a fantastic open source data mining toolkit for Java. You should find plenty of interesting things there that will help you solve your problem.

0 讨论(0)
发布评论:

提交评论
- 加载中...

不知归路

2021-02-10 02:51

Here's the Bayesian computation and one example/test:

def estimateProbability(priorProbs, buyCount, noBuyCount):
  # first, estimate the prob that the actual buy/nobuy counts would be observed
  # given each of the priors (times a constant that's the same in each case and
  # not worth the effort of computing;-)`
  condProbs = [p**buyCount * (1.0-p)**noBuyCount for p in priorProbs]
  # the normalization factor for the above-mentioned neglected constant
  # can most easily be computed just once
  normalize = 1.0 / sum(condProbs)
  # so here's the probability for each of the prior (starting from a uniform
  # metaprior)
  priorMeta = [normalize * cp for cp in condProbs]
  # so the result is the sum of prior probs weighed by prior metaprobs
  return sum(pm * pp for pm, pp in zip(priorMeta, priorProbs))

def example(numProspects=4):
  # the a priori prob of buying was either 0.3 or 0.7, how does it change
  # depending on how 4 prospects bought or didn't?
  for bought in range(0, numProspects+1):
    result = estimateProbability([0.3, 0.7], bought, numProspects-bought)
    print 'b=%d, p=%.2f' % (bought, result)

example()

output is:

b=0, p=0.31
b=1, p=0.36
b=2, p=0.50
b=3, p=0.64
b=4, p=0.69

which agrees with my by-hand computation for this simple case. Note that the probability of buying, by definition, will always be between the lowest and the highest among the set of priori probabilities; if that's not what you want you might want to introduce a little fudge by introducing two "pseudo-products", one that nobody will ever buy (p=0.0), one that anybody will always buy (p=1.0) -- this gives more weight to actual observations, scarce as they may be, and less to statistics about past products. If we do that here, we get:

b=0, p=0.06
b=1, p=0.36
b=2, p=0.50
b=3, p=0.64
b=4, p=0.94

Intermediate levels of fudging (to account for the unlikely but not impossible chance that this new product may be worse than any one ever previously sold, or better than any of them) can easily be envisioned (give lower weight to the artificial 0.0 and 1.0 probabilities, by adding a vector priorWeights to estimateProbability's arguments).

This kind of thing is a substantial part of what I do all day, now that I work developing applications in Business Intelligence, but I just can't get enough of it...!-)

0 讨论(0)