I just learned that dereferencing null
in C and C++ can sometimes produce undefined results. This is very intriguing to me, like all bizarre programming behaviors
The behavior is defined in 15.12.4.4 Locate Method to Invoke:
Otherwise, an instance method is to be invoked and there is a target reference. If the target reference is null, a NullPointerException is thrown at this point. Otherwise, the target reference is said to refer to a target object and will be used as the value of the keyword this in the invoked method. The other four possibilities for the invocation mode are then considered.
Dereferencing null should throw a NullPointerException.
You cannot get undefined behaviour from a null
in pure Java (unless there is a serious bug in the JVM!). The JLS specifies that any attempt to explicitly or implicitly dereference a null
will result in a NullPointerException
. There is no wriggle room that allows for any undefined behaviour that is related to the handling of null
.
However, if your application includes ... or makes use of ... native
methods, it is possible for one of those methods to mishandle a null
in a way that results in undefined behaviour. You can also get undefined behaviour using the Unsafe
class. But both of these scenarios mean you are not using pure Java. (When you step outside of pure Java, the guarantees of the JLS no longer necessarily apply!)
(The one area where unpredictable things can happen is in multi-threading. But even then, the set of possible behaviours is defined. For instance, if you don't synchronize state sharing adequately you may see stale values in fields. But you won't see totally random values ... or bad addresses that result in segmentation violations.)
If it is possible, then it would be possible for a malicious program to do so as well, which would open up an interesting security concern.
A malicious program can do almost anything. But the correct way to deal with this is to execute code that you don't trust (i.e. possibly malicious code) in a sandbox. A typical sandbox would forbid calling Unsafe
or loading a native library ... and lots of other things that a malicious program could exploit.
The very concept of a language feature having undefined behaviour is something that the writers of the C and C++ standards use to make it clear that the standard does not require any particular behaviour. This gives the various implementers of C and C++ to do whatever is most efficient or convenient for the particular hardware or operating system the implementation is for. This is because C has always privileged performance over portability. But Java has the opposite priorities; its early slogan was "write once, run anywhere". So the Java language specification does not talk about undefined behaviour, and strives to define the behaviour of all the language features.
You seem to think that using a null reference could somehow corrupt memory in some circumstances. I think you are confusing C/C++ pointers with Java references. A pointer is essentially a memory address: by casting it to a void *
and dereferencing it you have unrestricted ability to corrupt the content of memory. A Java reference is not like a memory address because the garbage collector must be free to move objects to different locations in memory. The translation of a Java reference to a memory address is thetefore something that only the JVM can do; it can never be something that a Java program itself can do. As this translation is entirely controlled by the JVM, the JVM can ensure that the translation is always valid, and always points to the object it ought to and nowhere else.
The JLS is not specific on how the null reference is implemented but it specifies its behavior. In other words, no there is no unspecified behavior. If you encounter a behavior other than specified in the JLS, it’s a bug.
Let me clarify this: you can use native code to trash certain structures to let the JVM crash but that has nothing to do with any Java behavior anymore. But on typical JVM implementation, the implementation of the null
behavior is the last thing you can disturb. Not, that it matters, what you trash if you override arbitrary memory from native code.
“Unspecified behavior” means that the specification itself lets there room for differences in the resulting behavior. This is not the case with Java.