I was reading Java\'s ArrayList
source code and noticed some comparisons in if-statements.
In Java 7, the method grow(int) uses
if (newC
a < b
and a - b < 0
can mean two different things. Consider the following code:
int a = Integer.MAX_VALUE;
int b = Integer.MIN_VALUE;
if (a < b) {
System.out.println("a < b");
}
if (a - b < 0) {
System.out.println("a - b < 0");
}
When run, this will only print a - b < 0
. What happens is that a < b
is clearly false, but a - b
overflows and becomes -1
, which is negative.
Now, having said that, consider that the array has a length that is really close to Integer.MAX_VALUE
. The code in ArrayList
goes like this:
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
oldCapacity
is really close to Integer.MAX_VALUE
so newCapacity
(which is oldCapacity + 0.5 * oldCapacity
) might overflow and become Integer.MIN_VALUE
(i.e. negative). Then, subtracting minCapacity
underflows back into a positive number.
This check ensures that the if
is not executed. If the code were written as if (newCapacity < minCapacity)
, it would be true
in this case (since newCapacity
is negative) so the newCapacity
would be forced to minCapacity
regardless of the oldCapacity
.
This overflow case is handled by the next if. When newCapacity
has overflowed, this will be true
: MAX_ARRAY_SIZE
is defined as Integer.MAX_VALUE - 8
and Integer.MIN_VALUE - (Integer.MAX_VALUE - 8) > 0
is true
. The newCapacity
is therefore rightly handled: hugeCapacity
method returns MAX_ARRAY_SIZE
or Integer.MAX_VALUE
.
NB: this is what the // overflow-conscious code
comment in this method is saying.
I found this explanation:
On Tue, Mar 9, 2010 at 03:02, Kevin L. Stern wrote:
I did a quick search and it appears that Java is indeed two's complement based. Nonetheless, please allow me to point out that, in general, this type of code worries me since I fully expect that at some point someone will come along and do exactly what Dmytro suggested; that is, someone will change:
if (a - b > 0)
to
if (a > b)
and the entire ship will sink. I, personally, like to avoid obscurities such as making integer overflow an essential basis for my algorithm unless there is a good reason to do so. I would, in general, prefer to avoid overflow altogether and to make the overflow scenario more explicit:
if (oldCapacity > RESIZE_OVERFLOW_THRESHOLD) { // Do something } else { // Do something else }
It's a good point.
In
ArrayList
we cannot do this (or at least not compatibly), becauseensureCapacity
is a public API and effectively already accepts negative numbers as requests for a positive capacity that cannot be satisfied.The current API is used like this:
int newcount = count + len; ensureCapacity(newcount);
If you want to avoid overflow, you would need to change to something less natural like
ensureCapacity(count, len); int newcount = count + len;
Anyway, I'm keeping the overflow-conscious code, but adding more warning comments, and "out-lining" huge array creation so that
ArrayList
's code now looks like:/** * Increases the capacity of this <tt>ArrayList</tt> instance, if * necessary, to ensure that it can hold at least the number of elements * specified by the minimum capacity argument. * * @param minCapacity the desired minimum capacity */ public void ensureCapacity(int minCapacity) { modCount++; // Overflow-conscious code if (minCapacity - elementData.length > 0) grow(minCapacity); } /** * The maximum size of array to allocate. * Some VMs reserve some header words in an array. * Attempts to allocate larger arrays may result in * OutOfMemoryError: Requested array size exceeds VM limit */ private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8; /** * Increases the capacity to ensure that it can hold at least the * number of elements specified by the minimum capacity argument. * * @param minCapacity the desired minimum capacity */ private void grow(int minCapacity) { // Overflow-conscious code int oldCapacity = elementData.length; int newCapacity = oldCapacity + (oldCapacity >> 1); if (newCapacity - minCapacity < 0) newCapacity = minCapacity; if (newCapacity - MAX_ARRAY_SIZE > 0) newCapacity = hugeCapacity(minCapacity); // minCapacity is usually close to size, so this is a win: elementData = Arrays.copyOf(elementData, newCapacity); } private int hugeCapacity(int minCapacity) { if (minCapacity < 0) // overflow throw new OutOfMemoryError(); return (minCapacity > MAX_ARRAY_SIZE) ? Integer.MAX_VALUE : MAX_ARRAY_SIZE; }
Webrev regenerated.
Martin
In Java 6, if you use the API as:
int newcount = count + len;
ensureCapacity(newcount);
And newCount
overflows (this becomes negative), if (minCapacity > oldCapacity)
will return false and you may mistakenly assume that the ArrayList
was increased by len
.
Looking at the code:
int newCapacity = oldCapacity + (oldCapacity >> 1);
If oldCapacity
is quite large, this will overflow, and newCapacity
will be a negative number. A comparison like newCapacity < oldCapacity
will incorrectly evaluate true
and the ArrayList
will fail to grow.
Instead, the code as written (newCapacity - minCapacity < 0
returns false) will allow the negative value of newCapacity
to be further evaluated in the next line, resulting in recalculating newCapacity
by invoking hugeCapacity
(newCapacity = hugeCapacity(minCapacity);
) to allow for the ArrayList
to grow up to MAX_ARRAY_SIZE
.
This is what the // overflow-conscious code
comment is trying to communicate, though rather obliquely.
So, bottom line, the new comparison protects against allocating an ArrayList
larger than the predefined MAX_ARRAY_SIZE
while allowing it to grow right up to that limit if needed.
The two forms behave exactly the same unless the expression a - b
overflows, in which case they are opposite. If a
is a large negative, and b
is a large positive, then (a < b)
is clearly true, but a - b
will overflow to become positive, so (a - b < 0)
is false.
If you're familiar with x86 assembly code, consider that (a < b)
is implemented by a jge
, which branches around the body of the if statement when SF = OF. On the other hand, (a - b < 0)
will act like a jns
, which branches when SF = 0. Hence, these behave differently precisely when OF = 1.