Problem with Precision floating point operation in C

前端未结

关注

 6  1940

执笔经年 2021-01-31 21:03

For one of my course project I started implementing \"Naive Bayesian classifier\" in C. My project is to implement a document classifier application (especially Spam) using huge

6条回答

佛祖请我去吃肉 (楼主)

2021-01-31 21:30
Here's a trick:
```
for the sake of readability, let S := p_1 * ... * p_n and H := (1-p_1) * ... * (1-p_n), 
then we have:

  p = S / (S + H)
  p = 1 / ((S + H) / S)
  p = 1 / (1 + H / S)

let`s expand again:

  p = 1 / (1 +  ((1-p_1) * ... * (1-p_n)) / (p_1 * ... * p_n))
  p = 1 / (1 + (1-p_1)/p_1 * ... * (1-p_n)/p_n)
```
So basically, you will obtain a product of quite large numbers (between 0 and, for p_i = 0.01, 99). The idea is, not to multiply tons of small numbers with one another, to obtain, well, 0, but to make a quotient of two small numbers. For example, if n = 1000000 and p_i = 0.5 for all i, the above method will give you 0/(0+0) which is NaN, whereas the proposed method will give you 1/(1+1*...1), which is 0.5.

You can get even better results, when all p_i are sorted and you pair them up in opposed order (let's assume p_1 < ... < p_n), then the following formula will get even better precision:
```
  p = 1 / (1 + (1-p_1)/p_n * ... * (1-p_n)/p_1)
```
that way you devide big numerators (small p_i) with big denominators (big p_(n+1-i)), and small numerators with small denominators.

edit: MSalter proposed a useful further optimization in his answer. Using it, the formula reads as follows:
```
  p = 1 / (1 + (1-p_1)/p_n * (1-p_2)/p_(n-1) * ... * (1-p_(n-1))/p_2 * (1-p_n)/p_1)
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...