How to handle the unsigned types (especially u4) of a Java class file in a Java program?

问题

From the Java Virtual Machine specification:

A class file consists of a stream of 8-bit bytes. All 16-bit, 32-bit, and 64-bit quantities are constructed by reading in two, four, and eight consecutive 8-bit bytes, respectively. Multibyte data items are always stored in big-endian order, where the high bytes come first. In the Java platform, this format is supported by interfaces java.io.DataInput and java.io.DataOutput and classes such as java.io.DataInputStream and java.io.DataOutputStream.

This chapter defines its own set of data types representing class file data: The types u1, u2, and u4 represent an unsigned one-, two-, or four-byte quantity, respectively. In the Java platform, these types may be read by methods such as readUnsignedByte, readUnsignedShort, and readInt of the interface java.io.DataInput.

Aside from the irritating mentioning of "64-bit quantities" (there is no u8, long and double are splitted in two u4 items), I don't understand how to handle the u4 type.

For u1 and u2 it's clear:

u1: read with readUnsignedByte, store in an int
u2: read with readUnsignedShort, store in an int

The specification advises this:

u4: read with readInt, store in an int (?)

What happens to values greater than Integer.MAX_VALUE? Does this advice silently imply that all values of type u4 are less than or equal to Integer.MAX_VALUE?

I came up with this idea:

u4: read with readUnsignedInt, store in a long

Unfortunalety, there is no such method. But that's not the problem, since you can easily write your own:

public long readUnsignedInt() throws IOException {
    return readInt() & 0xFFFFFFFFL;
}

So, here are two questionable spots:

The Code attribute:

Code_attribute {
...
u4 code_length;
u1 code[code_length];
...
}

Why is code_length not of type u2? Later it says:

The value of the code_length item must be less than 65536.
The SourceDebugExtension attribute:

SourceDebugExtension_attribute {
...
u4 attribute_length;
u1 debug_extension[attribute_length];
}
...
Note that the debug_extension array may denote a string longer than that which can be represented with an instance of class String.

Why? Can u4 values indeed exceed Integer.MAX_VALUE (since I think this is the maximum length of a String instance)?

回答1:

To easily lift 64K code length restriction, if such a need arise.
Since there is no mention that u4 values cannot exceed Integer.MAX_VALUE, then one must assume that u4 values can exceed Integer.MAX_VALUE. JVM spec lefts nothing implicit.

回答2:

If you want to process class files efficiently, you should not waste too much resources to solely hypothetical cases. As you’ve note yourself, the size of the code array is specified as u4, but the actually supported values are restricted to the u2. Likewise, all other u4 size values are implicitly restricted, if you consider that the only officially supported ways of getting a class file into a JVM are based on an array or on a ByteBuffer, both being restricted to a signed int representing their total size.

Even if there was a way to get bigger class files into the JVM, there are other parts, like the Instrumentation API, expecting the possibility to convert a class back into an ordinary array. Even if a future JVM gets a real support for bigger class files, augmenting all APIs with alternatives using a new buffer type, your application, compiled today, is restricted to today’s APIs and buffer types.

So if the total size of a class file is intrinsically restricted to the maximum signed int, i.e. 2³¹ bytes, there is no need to consider the possibility of a part of the class file, like one attribute, being bigger than that. While you could form a theoretically correct class file with such a humongous attribute, even the JVM itself wouldn’t support it. There’s also no real life relevance for such scenarios.

So if the question isn’t how to handle them, the question is how to correctly reject them. If you are going to process a file, it could indeed have more than Integer.MAX_VALUE bytes, which you have to consider anyway, if you are going to read the file into a buffer for further processing. Then, checking the size before doing anything else and throwing an UnsupportedOperationException would be appropriate. On a 32 Bit JVM, even throwing an OutOfMemoryError would be appropriate, as any real attempt to buffer the contents of that file would end up this way.

If the file is smaller than Integer.MAX_VALUE or you are receiving the class file through an API that intrinsically limits the class file to be smaller than that, e.g. by passing an array or ByteBuffer, you can carry on and consider every u4 size value being bigger than Integer.MAX_VALUE as invalid, as it is denoting a size bigger than the class file itself. You don’t need a special readUnsignedInt method, as int and u4 both have the same size, you only have to interpret the value correctly to sort out the values outside the valid positive int range.

With Java 8, this is especially easy:

int codeSize=bytebuffer.getInt();
if(Integer.compareUnsigned(codeSize, 65536)>0)
    throw new IllegalArgumentException(
        "invalid code size "+Integer.toUnsignedString(codeSize));
// carry on using the int value ordinarily

with earlier versions, you may consider that u4 values greater than Integer.MAX_VALUE will appear as negative when being interpreted as int:

int codeSize=bytebuffer.getInt();
if(codeSize<0 || codeSize>65536)
    throw new IllegalArgumentException("invalid code size "+(codeSize&0xFFFFFFFFL));
// carry on using the int value ordinarily

Likewise, handle the other u4 size values:

int size=bytebuffer.getInt();
// ByteBuffer can't be bigger than Integer.MAX_VALUE bytes:
if(size<0) throw new IllegalArgumentException(
    "truncated class file (attribute size "+(size&0xFFFFFFFFL)+')');
// carry on using the int value ordinarily

来源：https://stackoverflow.com/questions/11660967/how-to-handle-the-unsigned-types-especially-u4-of-a-java-class-file-in-a-java

标签

java

jvm

bytecode

unsigned