问题
From the Java Virtual Machine specification:
A
class
file consists of a stream of 8-bit bytes. All 16-bit, 32-bit, and 64-bit quantities are constructed by reading in two, four, and eight consecutive 8-bit bytes, respectively. Multibyte data items are always stored in big-endian order, where the high bytes come first. In the Java platform, this format is supported by interfaces java.io.DataInput andjava.io.DataOutput
and classes such as java.io.DataInputStream and java.io.DataOutputStream.This chapter defines its own set of data types representing
class
file data: The typesu1
,u2
, andu4
represent an unsigned one-, two-, or four-byte quantity, respectively. In the Java platform, these types may be read by methods such asreadUnsignedByte
,readUnsignedShort
, andreadInt
of the interfacejava.io.DataInput
.
Aside from the irritating mentioning of "64-bit quantities" (there is no u8
, long and double are splitted in two u4
items), I don't understand how to handle the u4
type.
For u1
and u2
it's clear:
u1
: read withreadUnsignedByte
, store in anint
u2
: read withreadUnsignedShort
, store in anint
The specification advises this:
u4
: read withreadInt
, store in anint
(?)
What happens to values greater than Integer.MAX_VALUE? Does this advice silently imply that all values of type u4
are less than or equal to Integer.MAX_VALUE
?
I came up with this idea:
u4
: read withreadUnsignedInt
, store in along
Unfortunalety, there is no such method. But that's not the problem, since you can easily write your own:
public long readUnsignedInt() throws IOException {
return readInt() & 0xFFFFFFFFL;
}
So, here are two questionable spots:
The Code attribute:
Code_attribute {
...
u4 code_length;
u1 code[code_length];
...
}Why is
code_length
not of typeu2
? Later it says:The value of the
code_length
item must be less than 65536.The SourceDebugExtension attribute:
SourceDebugExtension_attribute {
...
u4 attribute_length;
u1 debug_extension[attribute_length];
}
...
Note that thedebug_extension
array may denote a string longer than that which can be represented with an instance of classString
.Why? Can
u4
values indeed exceedInteger.MAX_VALUE
(since I think this is the maximum length of aString
instance)?
回答1:
- To easily lift 64K code length restriction, if such a need arise.
- Since there is no mention that u4 values cannot exceed Integer.MAX_VALUE, then one must assume that u4 values can exceed Integer.MAX_VALUE. JVM spec lefts nothing implicit.
回答2:
If you want to process class files efficiently, you should not waste too much resources to solely hypothetical cases. As you’ve note yourself, the size of the code array is specified as u4
, but the actually supported values are restricted to the u2
. Likewise, all other u4
size values are implicitly restricted, if you consider that the only officially supported ways of getting a class file into a JVM are based on an array or on a ByteBuffer, both being restricted to a signed int
representing their total size.
Even if there was a way to get bigger class files into the JVM, there are other parts, like the Instrumentation API, expecting the possibility to convert a class back into an ordinary array. Even if a future JVM gets a real support for bigger class files, augmenting all APIs with alternatives using a new buffer type, your application, compiled today, is restricted to today’s APIs and buffer types.
So if the total size of a class file is intrinsically restricted to the maximum signed int
, i.e. 2³¹
bytes, there is no need to consider the possibility of a part of the class file, like one attribute, being bigger than that. While you could form a theoretically correct class file with such a humongous attribute, even the JVM itself wouldn’t support it. There’s also no real life relevance for such scenarios.
So if the question isn’t how to handle them, the question is how to correctly reject them. If you are going to process a file, it could indeed have more than Integer.MAX_VALUE
bytes, which you have to consider anyway, if you are going to read the file into a buffer for further processing. Then, checking the size before doing anything else and throwing an UnsupportedOperationException
would be appropriate. On a 32 Bit JVM, even throwing an OutOfMemoryError
would be appropriate, as any real attempt to buffer the contents of that file would end up this way.
If the file is smaller than Integer.MAX_VALUE
or you are receiving the class file through an API that intrinsically limits the class file to be smaller than that, e.g. by passing an array or ByteBuffer
, you can carry on and consider every u4
size value being bigger than Integer.MAX_VALUE
as invalid, as it is denoting a size bigger than the class file itself. You don’t need a special readUnsignedInt
method, as int
and u4
both have the same size, you only have to interpret the value correctly to sort out the values outside the valid positive int
range.
With Java 8, this is especially easy:
int codeSize=bytebuffer.getInt();
if(Integer.compareUnsigned(codeSize, 65536)>0)
throw new IllegalArgumentException(
"invalid code size "+Integer.toUnsignedString(codeSize));
// carry on using the int value ordinarily
with earlier versions, you may consider that u4
values greater than Integer.MAX_VALUE
will appear as negative when being interpreted as int
:
int codeSize=bytebuffer.getInt();
if(codeSize<0 || codeSize>65536)
throw new IllegalArgumentException("invalid code size "+(codeSize&0xFFFFFFFFL));
// carry on using the int value ordinarily
Likewise, handle the other u4
size values:
int size=bytebuffer.getInt();
// ByteBuffer can't be bigger than Integer.MAX_VALUE bytes:
if(size<0) throw new IllegalArgumentException(
"truncated class file (attribute size "+(size&0xFFFFFFFFL)+')');
// carry on using the int value ordinarily
来源:https://stackoverflow.com/questions/11660967/how-to-handle-the-unsigned-types-especially-u4-of-a-java-class-file-in-a-java