floating point operations in go

谁都会走 提交于 2019-12-01 21:40:53

问题


Here's the sample code in go:

package main

import "fmt"

func mult32(a, b float32) float32 { return a*b }
func mult64(a, b float64) float64 { return a*b }


func main() {
    fmt.Println(3*4.3)                  // A1, 12.9
    fmt.Println(mult32(3, 4.3))         // B1, 12.900001
    fmt.Println(mult64(3, 4.3))         // C1, 12.899999999999999

    fmt.Println(12.9 - 3*4.3)           // A2, 1.8033161362862765e-130
    fmt.Println(12.9 - mult32(3, 4.3))  // B2, -9.536743e-07
    fmt.Println(12.9 - mult64(3, 4.3))  // C2, 1.7763568394002505e-15

    fmt.Println(12.9 - 3*4.3)                               // A4, 1.8033161362862765e-130
    fmt.Println(float32(12.9) - float32(3)*float32(4.3))    // B4, -9.536743e-07
    fmt.Println(float64(12.9) - float64(3)*float64(4.3))    // C4, 1.7763568394002505e-15

}

Results differences between lines A1, B1 and C1 are understandable. However, starting from A2 to C2 magic comes. Result from neither of B2 nor C2 matches the result from A2 line. The same is true for lines x2 (x = A, B or C) - but the outputs of x2 and x4 are the same.

Just to be sure let's print the results in the binary form.

    fmt.Printf("%b\n", 3*4.3)                   // A11, 7262054399134925p-49
    fmt.Printf("%b\n", mult32(3, 4.3))          // B11, 13526631p-20
    fmt.Printf("%b\n", mult64(3, 4.3))          // C11, 7262054399134924p-49

    fmt.Printf("%b\n", 12.9 - 3*4.3)            // A12, 4503599627370496p-483
    fmt.Printf("%b\n", 12.9 - mult32(3, 4.3))   // B12, -8388608p-43
    fmt.Printf("%b\n", 12.9 - mult64(3, 4.3))   // C12, 4503599627370496p-101

    fmt.Printf("%b\n", 12.9 - 3*4.3)                                // A14, 4503599627370496p-483
    fmt.Printf("%b\n", float32(12.9) - float32(3)*float32(4.3))     // B14, -8388608p-43
    fmt.Printf("%b\n", float64(12.9) - float64(3)*float64(4.3))     // C14, 4503599627370496p-101

Some facts from the code above (one in the bin form):

  1. There is difference between line A11 and C11 (last digit - just before the exponent).
  2. Lines A12 and C12 are almost the same (except the exponen!!!), the same can be observed between line A14 and C14.

And here the questions come:

  1. How computations of bare (naked :)) numbers are performed? (computations in every Axx line)
  2. Are they performed by compiler/whatever?
  3. If yes, then why are they different? Optimisation?
  4. Are they computed in some system which differs from IEE-754?
  5. If yes, why so?
  6. Is achieving more accurate precision justifiest such approach?

Code has been tested on 64bit linux under both "go run" and "go build" (go1.0.3), and also on that site: http://tour.golang.org/


回答1:


  1. Constants:

    • Numeric constants represent values of arbitrary precision and do not overflow.
    • Represent integer constants with at least 256 bits.
    • Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed exponent of at least 32 bits.
  2. Yes, by the compiler for compile time constants.

  3. Yes, they're different: More precision is involved. See 1.

  4. Yes, see 1.

  5. To minimize accumulation of floating point errors for multi-term floating point constant expressions.

  6. Of course yes. Can achieving lower precision be ever a goal? It's enough that run-time floating point operations are intrinsically imperfect, no need to add more imprecision from constant expressions.




回答2:


Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed exponent of at least 32 bits.

Note that Go 1.8 (currently in beta in Q4 2016, released in Q1 2017) changes that definition:

The language specification now only requires that implementations support up to 16-bit exponents in floating-point constants.
This does not affect either the “gc” or gccgo compilers, both of which still support 32-bit exponents.

That comes from change 17711

spec: require 16 bit minimum exponent in constants rather than 32

A 16bit binary exponent permits a constant range covering roughly the range from 7e-9865 to 7e9863 which is more than enough for any practical and hypothetical constant arithmetic.

Furthermore, until recently cmd/compile could not handle very large exponents correctly anyway; i.e., the chance that any real programs (but for tests that explore corner cases) are affected are close to zero.

Finally, restricting the minimum supported range significantly reduces the implementation complexity in an area that hardly matters in reality for new or alternative spec-compliant implementations that don't or cannot rely on pre-existing arbitratry precision arithmetic packages that support a 32bit exponent range.

This is technically a language change but for the reasons mentioned above this is unlikely to affect any real programs, and certainly not programs compiled with the gc or gccgo compilers as they currently support up to 32bit exponents.

See issue 13572 mentioning that:

In Go 1.4 the compiler rejected exponents larger than 10000 (due to knowing the code didn't work for larger exponents) without any complaint from users.

In earlier versions of Go, large exponents were silently mishandled, again without any complaint from users.



来源:https://stackoverflow.com/questions/18056787/floating-point-operations-in-go

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!