can multiprecision signed multiply be performed with imul instruction?

前端 未结 1 483
情书的邮戳
情书的邮戳 2021-01-24 04:53

I am writing a function library to provide all conventional operators and functions for signed-integer types s0128, s0256, s0512, s1

相关标签:
1条回答
  • 2021-01-24 05:14

    When you build an extended precision signed multiply out of smaller multiplies, you end up with a mixture of signed and unsigned arithmetic.

    In particular, if you break a signed value in half, you treat the upper half as signed, and the lower half as unsigned. The same is true for extended precision addition, in fact.

    Consider this arbitrary example, where AH and AL represent the high and low halves of A, and BH and BL represent the high and low halves of B. (Note: these aren't meant to represent x86 register halves, just halves of a multiplicand.) The L terms are unsigned and the H terms are signed.

                  AH : AL
               x  BH : BL
      -------------------
                  AL * BL    unsigned x unsigned => zero extend to full precision
             AH * BL           signed x unsigned => sign extend to full precision
             AL * BH         unsigned x   signed => sign extend to full precision
        AH * BH                signed x   signed
    

    The AL * BL product is unsigned because both AL and BL are unsigned. Therefore, it gets zero extended when you promote it to the full precision of the result.

    The AL * BH and AH * BL products mix signed and unsigned values. The resulting product is signed, and that needs to be sign extended when you promote it to the full precision of the result.

    The following C code demonstrates a 32×32 multiply implemented in terms of 16×16 multiplies. The same principle applies when building 128×128 multiplies out of 64×64 multiplies.

    #include <stdint.h>
    #include <stdio.h>
    
    int64_t mul32x32( int32_t x, int32_t y )
    {
        int16_t x_hi = 0xFFFF & (x >> 16);
        int16_t y_hi = 0xFFFF & (y >> 16);
    
        uint16_t x_lo = x & 0xFFFF;
        uint16_t y_lo = y & 0xFFFF;
    
    
        uint32_t lo_lo = (uint32_t)x_lo * y_lo;    // unsigned x unsigned
        int32_t  lo_hi = (x_lo * (int32_t)y_hi);   // unsigned x   signed
        int32_t  hi_lo = ((int32_t)x_hi * y_lo);   //   signed x unsigned
        int32_t  hi_hi = ((int32_t)x_hi * y_hi);   //   signed x   signed
    
    
        int64_t  prod = lo_lo 
                      + (((int64_t)lo_hi + hi_lo) << 16) 
                      + ((int64_t)hi_hi << 32);
    
        return prod;
    }
    
    int check(int a, int b)
    {
        int64_t ref = (int64_t)a * (int64_t)b;
        int64_t tst = mul32x32(a, b);
    
        if (ref != tst)
        {
            printf("%.8X x %.8X => %.16llX vs %.16llX\n",
                    (unsigned int)a,         (unsigned int)b, 
                    (unsigned long long)ref, (unsigned long long)tst);
            return 1;
        }
    
        return 0;
    }
    
    
    int main()
    {
        int a = (int)0xABCDEF01;
        int b = (int)0x12345678;
        int c = (int)0x1234EF01;
        int d = (int)0xABCD5678;
    
        int fail = 0;
    
        fail += check(a, a);
        fail += check(a, b);
        fail += check(a, c);
        fail += check(a, d);
    
        fail += check(b, b);
        fail += check(b, c);
        fail += check(b, d);
    
        fail += check(c, c);
        fail += check(c, d);
    
        fail += check(d, d);
    
        printf("%d tests failed\n", fail);
        return 0;
    }
    

    This pattern extends even if you break the multiplicands into more than two pieces. That is, only the most-significant piece of a signed number gets treated as signed. All of the other pieces are unsigned. Consider this example, which divides each multiplicand into 3 pieces:

                          A2 : A1 : A0
                       x  B2 : B1 : B0
      ---------------------------------
                               A0 * B0    => unsigned x unsigned   => zero extend
                          A1 * B0         => unsigned x unsigned   => zero extend
                     A2 * B0              =>   signed x unsigned   => sign extend
                          A0 * B1         => unsigned x unsigned   => zero extend
                     A1 * B1              => unsigned x unsigned   => zero extend
                A2 * B1                   =>   signed x unsigned   => sign extend
                     A0 * B2              => unsigned x   signed   => sign extend
                A1 * B2                   => unsigned x   signed   => sign extend
           A2 * B2                        =>   signed x   signed
    

    Because of all the mixed-signedness and sign extension fun, it's often just easier to implement a signed × signed multiply as an unsigned × unsigned multiply, and conditionally negate at the end if the signs if the multiplicands differ. (And, in fact, when you get to the extended precision float, as long as you stay in sign-magnitude form like IEEE-754, you won't have to deal with signed multiply.)

    This assembly gem shows how to negate extended precision values efficiently. (The gems page is a little dated, but you may find it interesting / useful.)

    0 讨论(0)
提交回复
热议问题