This is also not a full answer, but I have a few ideas.
I believe I have found as good an explanation as we will find without somebody from the .NET JIT team answering.
UPDATE
I looked a little deeper, and I believe I have found the source of the issue. It appears to be caused by a combination of a bug in the JIT type-initialization logic, and a change in the C# compiler that relies on the assumption that the JIT works as intended. I think the JIT bug existed in .NET 4.0, but was uncovered by the change in the compiler for .NET 4.5.
I do not think that beforefieldinit
is the only issue here. I think it's simpler than that.
The type System.String
in mscorlib.dll from .NET 4.0 contains a static constructor:
.method private hidebysig specialname rtspecialname static
void .cctor() cil managed
{
// Code size 11 (0xb)
.maxstack 8
IL_0000: ldstr ""
IL_0005: stsfld string System.String::Empty
IL_000a: ret
} // end of method String::.cctor
In the .NET 4.5 version of mscorlib.dll, String.cctor
(the static constructor) is conspicuously absent:
..... No static constructor :( .....
In both versions the String
type is adorned with beforefieldinit
:
.class public auto ansi serializable sealed beforefieldinit System.String
I tried to create a type that would compile to IL similarly (so that it has static fields but no static constructor .cctor
), but I could not do it. All of these types have a .cctor
method in IL:
public class MyString1 {
public static MyString1 Empty = new MyString1();
}
public class MyString2 {
public static MyString2 Empty = new MyString2();
static MyString2() {}
}
public class MyString3 {
public static MyString3 Empty;
static MyString3() { Empty = new MyString3(); }
}
My guess is that two things changed between .NET 4.0 and 4.5:
First: The EE was changed so that it would automatically initialize String.Empty
from unmanaged code. This change was probably made for .NET 4.0.
Second: The compiler changed so that it did not emit a static constructor for string, knowing that String.Empty
would be assigned from the unmanaged side. This change appears to have been made for .NET 4.5.
It appears that the EE does not assign String.Empty
soon enough along some optimization paths. The change made to the compiler (or whatever changed to make String.cctor
disappear) expected the EE make this assignment before any user code executes, but it appears that the EE does not make this assignment before String.Empty
is used in methods of reference type reified generic classes.
Lastly, I believe that the bug is indicative of a deeper problem in the JIT type-initialization logic. It appears the change in the compiler is a special case for System.String
, but I doubt that the JIT has made a special case here for System.String
.
Original
First of all, WOW The BCL people have gotten very creative with some performance optimizations. Many of the String
methods are now performed using a Thread static cached StringBuilder
object.
I followed that lead for a while, but StringBuilder
isn't used on the Trim
code path, so I decided it couldn't be a Thread static problem.
I think I found a strange manifestation of the same bug though.
This code fails with an access violation:
class A<T>
{
static A() { }
public A(out string s) {
s = string.Empty;
}
}
class B
{
static void Main() {
string s;
new A<object>(out s);
//new A<int>(out s);
System.Console.WriteLine(s.Length);
}
}
However, if you uncomment //new A<int>(out s);
in Main
then the code works just fine. In fact, if A
is reified with any reference type, the program fails, but if A
is reified with any value type then the code does not fail. Also if you comment out A
's static constructor, the code never fails. After digging into Trim
and Format
, it is clear that the problem is that Length
is being inlined, and that in these samples above the String
type has not been initialized. In particular, inside the body of A
's constructor, string.Empty
is not correctly assigned, although inside the body of Main
, string.Empty
is assigned correctly.
It is amazing to me that the type initialization of String
somehow depends on whether or not A
is reified with a value type. My only theory is that there is some optimizing JIT code path for generic type-initialization that is shared among all types, and that that path makes assumptions about BCL reference types ("special types?") and their state. A quick look though other BCL classes with public static
fields shows that basically all of them implement a static constructor (even those with empty constructors and no data, like System.DBNull
and System.Empty
. BCL value types with public static
fields do not seem to implement a static constructor (System.IntPtr
, for instance). This seems to indicate that the JIT makes some assumptions about BCL reference type initialization.
FYI Here is the JITed code for the two versions:
A<object>.ctor(out string)
:
public A(out string s) {
00000000 push rbx
00000001 sub rsp,20h
00000005 mov rbx,rdx
00000008 lea rdx,[FFEE38D0h]
0000000f mov rcx,qword ptr [rcx]
00000012 call 000000005F7AB4A0
s = string.Empty;
00000017 mov rdx,qword ptr [FFEE38D0h]
0000001e mov rcx,rbx
00000021 call 000000005F661180
00000026 nop
00000027 add rsp,20h
0000002b pop rbx
0000002c ret
}
A<int32>.ctor(out string)
:
public A(out string s) {
00000000 sub rsp,28h
00000004 mov rax,rdx
s = string.Empty;
00000007 mov rdx,12353250h
00000011 mov rdx,qword ptr [rdx]
00000014 mov rcx,rax
00000017 call 000000005F691160
0000001c nop
0000001d add rsp,28h
00000021 ret
}
The rest of the code (Main
) is identical between the two versions.
EDIT
In addition, the IL from the two versions is identical except for the call to A.ctor
in B.Main()
, where the IL for the first version contains:
newobj instance void class A`1<object>::.ctor(string&)
versus
... A`1<int32>...
in the second.
Another thing to note is that the JITed code for A<int>.ctor(out string)
: is the same as in the non-generic version.