We do come across this particular and one of the most common exception in our coding/development life day or another day. My Question is NOT about W
It doesn't have to be (there could be explicit checks), but it works from trapping access violation exceptions.
A .NET object will be turned into a native object: Its fields become a block of memory laid out in a particular manner, its methods are jitted into native machine code methods, and a v-table or other virtual method overload mechanism is created.
Accessing a field then, means finding the address of the object, adding on the offset of the member, and reading or writing the piece of memory referred to.
Calling a virtual method, means finding the address of the object, finding its method table (set offset within object), finding the method's address (set offset within the table) and calling the method at that address with the address of the object being passed (the this
pointer).
Calling a non-virtual method, means calling the method with the address of the object passed (the this
pointer).
Clearly if there is not an actual object at the address in question cases 1 and 2 will go wrong in some way, while case 3 will work (but could in turn lead to case 1 or 2). There's two main ways this can go wrong:
It could access an arbitrary bit of memory that is not really an object of our type, leading to all sorts of exciting and really hard to trace bugs (.NET code generally won't result in anything that causes this scenario).
It could access an arbitrary bit of memory that is protected, leading to an access violation.
You may know about the second case from C, C++ or ASM coding. If not, you'll probably still have seen a program crash and with its dying breath talk about an access violation at some address. If so, you may have noticed that while the address given could be just about anything, it'll most often be either 0x00000000 or something very low like 0x00000020. Those were caused by code trying to dereference a null pointer whether to access a field or call a virtual method (which is essentially accessing a field and then calling depending on what you get).
Now, since the first 64k or memory is always protected, dereferencing a null pointer will always result in the second case (access violation) rather than the first case (arbitrary memory being mis-used and resulting in bizarre "fandango on the core" bugs).
This is all exactly the same with .NET (or rather, with the jitted code produced by it), but if (A) the access violation happened at an address lower than 0x00010000 and (B) such a violation is found to have happened by code that was jitted, then it is turned into a NullReferenceException
, otherwise it gets turned into an AccessViolationException
.
We can simulate this with code that doesn't dereference, but which does access protected memory (we'll only read, so if we should happen to accidentally hit memory that isn't protected, the result won't be too weird!):
The following code will raise an AccessViolationException:
unsafe
{
int read = *((int*)long.MaxValue - 8);
}
The following code will raise a NullReferenceException:
unsafe
{
int read = *((int*)8);
}
Neither code is actually dereferencing anything. Both cause access violations, but the CLR assumes the later was probably caused by a null reference (in fairness, by far the most likely scenario) and raises it.
So, we can see how field access and callvirt
can cause this.
It's worth noting now that because of a decision to not allow C# to call methods on null references even when safe to do so, callvirt
is used as the IL for the majority of cases in C#, and the only exceptions would be cases of static methods or where it can be shown at compile time to not be on a null reference. (Edit: There are a few other cases where the compiler can see that a callvirt
can be replaced by a call
, even when the method actually is virtual [if the compiler can tell which overload would be hit] and the later compilers will do this slightly more often, though it will still use callvirt
more often than you might imagine).
An interesting case is where optimisation means that a method called with callvirt
could be inlined, but where it isn't known at compile-time to be guaranteed non-null. In such a case a field access may be added before the place where where the "call" (that isn't really a call) happens, precisely to trigger the NullReferenceException
at the start, rather than in the middle, of the method. This means the optimisation does not change the observed behaviour.
Have you read the CLI Spec - ECMA-335? You will find some answers there.
11 Semantics of classes...When a variable or field that has a class as its type is created (for example, by calling a method that has a local variable of a class type), the value shall initially be null, a special value that := with all class types even though it is not an instance of any particular class.
And the description of the ldnull instruction:
The ldnull pushes a null reference (type O) on the stack. This is used to initialize locations before they become live or when they become dead. [Rationale: It might be thought that ldnull is redundant: why not use ldc.i4.0 or ldc.i8.0 instead? The answer is that ldnull provides a size-agnostic null – analogous to an ldc.i instruction, which does not exist. However, even if CIL were to include an ldc.i instruction it would still benefit verification algorithms to retain the ldnull instruction because it makes type tracking easier. end rationale] Verifiability: The ldnull instruction is always verifiable, and produces a value of the null type (§1.8.1.2) that is assignable-to (§I.8.7.3)any other reference type.
The MS implementation, IIRC, does this via an access violation. Null is essentially a zero reference, and basically: they deliberately reserve that address space and leave this page unmapped. The memory access violation is raised at the CPU/OS level automatically (i.e. without needing extra code to do a null check), and the CLI then reports this as a null-reference exception.
Interestingly, because memory is handled in pages, you can actually simulate (if you try hard enough) a null-reference exception on a non-zero but low value, for the same reasons.
Edit: Eric Lippert discusses this on this related question/answer: https://stackoverflow.com/a/8681563