Problem with Precision floating point operation in C

前端 未结 6 1940
执笔经年
执笔经年 2021-01-31 21:03

For one of my course project I started implementing \"Naive Bayesian classifier\" in C. My project is to implement a document classifier application (especially Spam) using huge

6条回答
  •  佛祖请我去吃肉
    2021-01-31 21:30

    Here's a trick:

    for the sake of readability, let S := p_1 * ... * p_n and H := (1-p_1) * ... * (1-p_n), 
    then we have:
    
      p = S / (S + H)
      p = 1 / ((S + H) / S)
      p = 1 / (1 + H / S)
    
    let`s expand again:
    
      p = 1 / (1 +  ((1-p_1) * ... * (1-p_n)) / (p_1 * ... * p_n))
      p = 1 / (1 + (1-p_1)/p_1 * ... * (1-p_n)/p_n)
    

    So basically, you will obtain a product of quite large numbers (between 0 and, for p_i = 0.01, 99). The idea is, not to multiply tons of small numbers with one another, to obtain, well, 0, but to make a quotient of two small numbers. For example, if n = 1000000 and p_i = 0.5 for all i, the above method will give you 0/(0+0) which is NaN, whereas the proposed method will give you 1/(1+1*...1), which is 0.5.

    You can get even better results, when all p_i are sorted and you pair them up in opposed order (let's assume p_1 < ... < p_n), then the following formula will get even better precision:

      p = 1 / (1 + (1-p_1)/p_n * ... * (1-p_n)/p_1)
    

    that way you devide big numerators (small p_i) with big denominators (big p_(n+1-i)), and small numerators with small denominators.

    edit: MSalter proposed a useful further optimization in his answer. Using it, the formula reads as follows:

      p = 1 / (1 + (1-p_1)/p_n * (1-p_2)/p_(n-1) * ... * (1-p_(n-1))/p_2 * (1-p_n)/p_1)
    

提交回复
热议问题