问题
I am starring at the original JPEG standard (ITU 81), in particular the figure Figure F.12: extending the sign bit of a decoded value in V:
For reference the SLL
terms means: shift left logical operation
(see page 15 of the PDF)
Now the famous libjpeg implementation decided to implement it this way (quite direct transcription):
/*
* Figure F.12: extend sign bit.
* On some machines, a shift and add will be faster than a table lookup.
*/
#ifdef AVOID_TABLES
#define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x))
#else
#define HUFF_EXTEND(x,s) ((x) < extend_test[s] ? (x) + extend_offset[s] : (x))
static const int extend_test[16] = /* entry n is 2**(n-1) */
{ 0, 0x0001, 0x0002, 0x0004, 0x0008, 0x0010, 0x0020, 0x0040, 0x0080,
0x0100, 0x0200, 0x0400, 0x0800, 0x1000, 0x2000, 0x4000 };
static const int extend_offset[16] = /* entry n is (-1 << n) + 1 */
{ 0, ((-1)<<1) + 1, ((-1)<<2) + 1, ((-1)<<3) + 1, ((-1)<<4) + 1,
((-1)<<5) + 1, ((-1)<<6) + 1, ((-1)<<7) + 1, ((-1)<<8) + 1,
((-1)<<9) + 1, ((-1)<<10) + 1, ((-1)<<11) + 1, ((-1)<<12) + 1,
((-1)<<13) + 1, ((-1)<<14) + 1, ((-1)<<15) + 1 };
#endif /* AVOID_TABLES */
Quite obviously shifting a negative signed value is UB, so I am wondering what the original author of the JPEG standard actually meant here. Is the JPEG standard limited to Two's complement number representation ?
回答1:
Is the JPEG standard limited to Two's complement number representation ?
With the chart using SLL
rather than *2
, the flow chart is relying of 2's complement implementation.
C, OTOH, is not limited to 2's complement nor using shifts in a certain "2's complement" way. Better to code without that assumption. Using unsigned types is a good first step.
// ((-1)<<15) + 1
((-1u)<<15) + 1
Need to see applications of HUFF_EXTEND()
for a deeper answer.
回答2:
Shifting negative values is only undefined behavior in the C language. On assembler level, such shifts are perfectly fine. A logical/arithmetic shift left instruction will move the MSB into the carry bit and that's it. The CPU will not halt and catch fire in undefined ways.
That being said, you can dodge UB in C by converting your signed number to unsigned. Then you will merely have implementation-defined behavior. For example, (int)(-1u<<1)
does actually not invoke UB, even though such code looks quite questionable.
The safe approach is to perform all calculations on unsigned numbers (uint32_t
) and convert to signed only when it is actually needed. Never perform any form of bitwise operations on signed types.
I would completely ignore compatibility with wildly exotic/fictional systems that do not use two's complement. Focus on compatibility with real-world mainstream computers.
回答3:
Quite obviously, shifting a negative quantity has the same effect as multiply/divide by two raised to the power n (for a << n
/a >> n
where a
is signed
) This is never undefined behaviour for a negative left operand, and is well and preciselly defined for C/C++.
Well, despite of the fact that the accepted answer has already been selected and the amount of confussion about the <<
and >>
operators this question has generated, I'll try to clarify what's undefined and what is not.
<<
operator behaves equally onsigned
andunsigned
left operand quantities, as the net effect on an integer is to numerically double it, and the process is the same forsigned
andunsigned
quantities, just insert a0
on the right part of the number, left shifting all the number bits the number indicated as the right operator.>>
operator behaves differently onsigned
andunsigned
left operands, as the net effect is, again, like integer division by two raised to the right operand, this means, for a two's complement representation to extend the most signifiant bit (0
on the left becomes00
and1
on the left becomes11
) to achieve the net effect of a division by two raised to the power of the right operand (-24 >> 3
becomes-3
, as dividing by 2^3, and24 >> 3
becomes3
) When usingunsigned
integers, this behaviour changes (0
becomes00
and1
becomes01
) making again the shifting like a division by two raised to the power of the right operator.
As the standards says, behaviour is undefined if you try to left/right shift a negative amount of bits (right operator must be > 0
) or an amount greater or equal than the left operator's size in bits. But this only affects the right operand, and not the left one.
On trying to get an explanation about JPEG decoding, just try to think that JPEG decoding uses COSIN transformation (which is defined for real vectors) on discrete real approximations encoded as signed quantities and or that, it uses signed integers to evaluate low precision samples, so they never deal with unsigned quantities (the library designers assume that operating on integers is faster than operating on floating point quantities).
NOTE
-24
is binary1111111...1111101000
, and right shifting by3
makes it11111...11111101
, or-3
. If we shift it again, it becomes111...1110
or-2
(-1.5
rounded to minus infinity), and again1111...1111
or-1
.24
is00000000....00011000
and right shifting it becomes000000...0000011
which is3
. If we shift it again, it becomes000...0001
or1
(1.5
rounded to minus infinity), and again we get000...0000
or0
(0.5
rounded to minus infinity).
(Beware that negative shifts made with this approach truncate up to minus infinity, making -1 >> 1
to become -1
, as -0.5
truncates to -1
and not to 0
.)
来源:https://stackoverflow.com/questions/40508958/shifting-a-negative-signed-value-is-undefined