Why is 'box' instruction emitted for generic?

问题

Here is fairly simple generic class. Generic parameter is constrained to be reference type. IRepository and DbSet also contain the same constraint.

public class Repository<TEntity> : IRepository<TEntity>
    where TEntity : class, IEntity
{
    protected readonly DbSet<TEntity> _dbSet;
    public void Insert(TEntity entity)
    {
        if (entity == null) 
        throw new ArgumentNullException("entity", "Cannot add null entity.");
        _dbSet.Add(entity);
    }
}

Compiled IL contains box instruction. Here is the release version (debug version also contains it though).

.method public hidebysig newslot virtual final 
    instance void  Insert(!TEntity entity) cil managed
{
  // Code size       38 (0x26)
  .maxstack  8
  IL_0000:  ldarg.1
  >>>IL_0001:  box        !TEntity
  IL_0006:  brtrue.s   IL_0018
  IL_0008:  ldstr      "entity"
  IL_000d:  ldstr      "Cannot add null entity."
  IL_0012:  newobj     instance void [mscorlib]System.ArgumentNullException::.ctor(string,
                                           string)
  IL_0017:  throw
  IL_0018:  ldarg.0
  IL_0019:  ldfld      class [EntityFramework]System.Data.Entity.DbSet`1<!0> class Repository`1<!TEntity>::_dbSet
  IL_001e:  ldarg.1
  IL_001f:  callvirt   instance !0 class [EntityFramework]System.Data.Entity.DbSet`1<!TEntity>::Add(!0)
  IL_0024:  pop
  IL_0025:  ret
} // end of method Repository`1::Insert

UPDATE:

With object.Equals(entity, default(TEntity)) it looks even worse:

  .maxstack  2
  .locals init ([0] !TEntity CS$0$0000)
  IL_0000:  ldarg.1
  >>>IL_0001:  box        !TEntity
  IL_0006:  ldloca.s   CS$0$0000
  IL_0008:  initobj    !TEntity
  IL_000e:  ldloc.0
  >>>IL_000f:  box        !TEntity
  IL_0014:  call       bool [mscorlib]System.Object::Equals(object,
                                object)
  IL_0019:  brfalse.s  IL_002b

UPDATE2:

For those who are interested, here is the code compiled by jit shown in debugger:

0cd5af28 55              push    ebp
0cd5af29 8bec            mov     ebp,esp
0cd5af2b 83ec18          sub     esp,18h
0cd5af2e 33c0            xor     eax,eax
0cd5af30 8945f0          mov     dword ptr [ebp-10h],eax
0cd5af33 8945ec          mov     dword ptr [ebp-14h],eax
0cd5af36 8945e8          mov     dword ptr [ebp-18h],eax
0cd5af39 894df8          mov     dword ptr [ebp-8],ecx
    //entity reference to [ebp-0Ch]
0cd5af3c 8955f4          mov     dword ptr [ebp-0Ch],edx
    //some debugger checks
0cd5af3f 833d9424760300  cmp     dword ptr ds:[3762494h],0
0cd5af46 7405            je      0cd5af4d  Branch
0cd5af48 e8e1cac25a      call    clr!JIT_DbgIsJustMyCode (67987a2e)
0cd5af4d c745fc00000000  mov     dword ptr [ebp-4],0
0cd5af54 90              nop

    //comparison or entity ref with  zero
0cd5af55 837df400        cmp     dword ptr [ebp-0Ch],0
0cd5af59 0f95c0          setne   al
0cd5af5c 0fb6c0          movzx   eax,al
0cd5af5f 8945fc          mov     dword ptr [ebp-4],eax
0cd5af62 837dfc00        cmp     dword ptr [ebp-4],0
    //if not zero, jump further
0cd5af66 7542            jne     0cd5afaa  Branch
    //throwing exception here

The reason of this question is actually that NDepend warns about using boxing/unboxing. I was curious why it found boxing in some generic classes, and now it's clear.

回答1:

The ECMA spec states this about the box instruction:

Stack transition: ..., val -> ..., obj

...

If typeTok is a generic parameter, the behavior of box instruction depends on the actual type at runtime. If this type [...] is a reference type then val is not changed.

What it's saying is that the compiler can assume that it's safe to box a reference type. So with generics, the compiler has two choices: emit the code that is guaranteed to work regardless of how the generic type is constrained, or optimize the code and omit redundant instructions where it can prove them to be unnecessary.

The Microsoft C# compiler, in general, tends to choose the simpler approach and leave all optimization to the JIT stage. To me, it looks like your example is exactly that: not optimizing something because implementing an optimization takes time, and saving this box instruction probably has very little value in practice.

C# allows even an unconstrained generic-typed value to be compared to null, so the compiler must support this general case. The easiest way to implement this general case is to use the box instruction, which does all the heavy-lifting of handling reference, value and nullable types, correctly pushing either a reference or a null value onto the stack. So the easiest thing for the compiler to do is to issue box regardless of the constraints, and then compare the value to zero (brtrue).

回答2:

I ran into a very relevant comment when reviewing the C# compiler source code that generates BOX instructions. The fncbind.cpp source file has this comment, not otherwise directly related to this particular code:

// NOTE: for the flags, we have to use EXF_FORCE_UNBOX (not EXF_REFCHECK) even when
// we know that the type is a reference type. The verifier expects all code for
// type parameters to behave as if the type parameter is a value type.
// The jitter should be smart about it....

So it is there because the verifier requires it.

And yes, the jitter is smart about it. It simply emits no code at all for the BOX instruction.

来源：https://stackoverflow.com/questions/20683715/why-is-box-instruction-emitted-for-generic

标签

generics