64-bit pointer arithmetic in C#, Check for arithmetic overflow changes behavior

前端 未结 4 1949
攒了一身酷
攒了一身酷 2021-01-06 13:31

I have some unsafe C# code that does pointer arithmetic on large blocks of memory on type byte*, running on a 64-bit machine. It works correctly most of the tim

相关标签:
4条回答
  • 2021-01-06 13:52

    It's a C# compiler bug (filed on Connect). @Grant has shown that the MSIL generated by the C# compiler interprets the uint operand as signed. That's wrong according to the C# spec, here's the relevant section (18.5.6):

    18.5.6 Pointer arithmetic

    In an unsafe context, the + and - operators (§7.8.4 and §7.8.5) can be applied to values of all pointer types except void*. Thus, for every pointer type T*, the following operators are implicitly defined:

    T* operator +(T* x, int y);
    T* operator +(T* x, uint y);
    T* operator +(T* x, long y);
    T* operator +(T* x, ulong y);
    T* operator +(int x, T* y);
    T* operator +(uint x, T* y);
    T* operator +(long x, T* y);
    T* operator +(ulong x, T* y);
    T* operator –(T* x, int y);
    T* operator –(T* x, uint y);
    T* operator –(T* x, long y);
    T* operator –(T* x, ulong y);
    long operator –(T* x, T* y);
    

    Given an expression P of a pointer type T* and an expression N of type int, uint, long, or ulong, the expressions P + N and N + P compute the pointer value of type T* that results from adding N * sizeof(T) to the address given by P. Likewise, the expression P - N computes the pointer value of type T* that results from subtracting N * sizeof(T) from the address given by P.

    Given two expressions, P and Q, of a pointer type T*, the expression P – Q computes the difference between the addresses given by P and Q and then divides that difference by sizeof(T). The type of the result is always long. In effect, P - Q is computed as ((long)(P) - (long)(Q)) / sizeof(T).

    If a pointer arithmetic operation overflows the domain of the pointer type, the result is truncated in an implementation-defined fashion, but no exceptions are produced.


    You're allowed to add a uint to a pointer, no implicit conversion takes place. And the operation does not overflow the domain of the pointer type. So truncation is not allowed.

    0 讨论(0)
  • 2021-01-06 13:57

    The difference between checked and unchecked here is actually a bit of a bug in the IL, or just some bad source code (I'm not a language expert so I will not comment on if the C# compiler is generating the correct IL for the ambigious source code). I compiled this test code using the 4.0.30319.1 version of the C# compiler (although the 2.0 verision seemed to do the same thing). The command line options I used were: /o+ /unsafe /debug:pdbonly.

    For the unchecked block, we have this IL code:

    //000008:     unchecked
    //000009:     {
    //000010:         Console.WriteLine("{0:x}", (long)(testPtr + offset));
      IL_000a:  ldstr      "{0:x}"
      IL_000f:  ldloc.0
      IL_0010:  ldloc.1
      IL_0011:  add
      IL_0012:  conv.u8
      IL_0013:  box        [mscorlib]System.Int64
      IL_0018:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                    object)
    

    At IL offset 11, the add gets 2 operands, one of type byte* and the other of type uint32. Per the CLI spec these are really normalized into native int and int32, respectively. According to the CLI spec (partition III to be precise), the result will be native int. Thus the secodn operand must be promoted to be of type native int. According to the spec, this is accomplished via a sign extension. So the uint.MaxValue (which is 0xFFFFFFFF or -1 in signed notation) is sign extened to 0xFFFFFFFFFFFFFFFF. Then the 2 operands are added (0x0000000008000000L + (-1L) = 0x0000000007FFFFFFL). The conv opcode is only needed for verification purposes to convert the native int into an int64, which in the generated code is a nop.

    Now for the checked block, we have this IL:

    //000012:     checked
    //000013:     {
    //000014:         Console.WriteLine("{0:x}", (long)(testPtr + offset));
      IL_001d:  ldstr      "{0:x}"
      IL_0022:  ldloc.0
      IL_0023:  ldloc.1
      IL_0024:  add.ovf.un
      IL_0025:  conv.ovf.i8.un
      IL_0026:  box        [mscorlib]System.Int64
      IL_002b:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                    object)
    

    It is virtually identical, except for the add and conv opcode. For the add opcode we've added 2 'suffixes'. The first one is the ".ovf" suffix which has an obvious meaning: check for overflow, but it is also required to 'enable the second suffix: ".un". (i.e. there is no "add.un", only "add.ovf.un"). The ".un" has 2 effects. The most obvious one is that the additiona nd overflow checking are done as if the operands were unsigned integers. From our CS classes way back when, hopefully we all remember that thanks to two's complement binary encoding, signed addition and unsigned addition are the same, so the ".un" really only impacts the overflow checking, right?

    Wrong.

    Remember that on the IL stack we don't have 2 64-bit numbers, we have an int32 and a native int (after normalization). Well the ".un" means that the conversion from int32 to native is treated like a "conv.u" rather than the default "conv.i" as above. Thus uint.MaxValue is zero extended to 0x00000000FFFFFFFFL. Then the add correctly produces 0x0000000107FFFFFFL. The conv opcode makes sure the unsigned operand can be represented as a signed int64 (which it can).

    Your fix works just find for 64-bit. At the IL level a more correct fix would be to explicitly convert the uint32 operand to native int or unsigned native int, and then both the check and unchecked would bhave identically for both 32-bit and 64-bit.

    0 讨论(0)
  • 2021-01-06 14:07

    I'm answering my own question as I have solved the problem, but would still be interested in reading comments about why the behavior changes with checked vs unchecked.

    This code demonstrates the problem as well as the solution (always casting the offset to long before adding):

    public static unsafe void Main(string[] args)
    {
        // Dummy pointer, never dereferenced
        byte* testPtr = (byte*)0x00000008000000L;
    
        uint offset = uint.MaxValue;
    
        unchecked
        {
            Console.WriteLine("{0:x}", (long)(testPtr + offset));
        }
    
        checked
        {
            Console.WriteLine("{0:x}", (long)(testPtr + offset));
        }
    
        unchecked
        {
            Console.WriteLine("{0:x}", (long)(testPtr + (long)offset));
        }
    
        checked
        {
            Console.WriteLine("{0:x}", (long)(testPtr + (long)offset));
        }
    }
    

    This will return (when run on a 64-bit machine):

    7ffffff
    107ffffff
    107ffffff
    107ffffff
    

    (BTW, in my project I first wrote all the code as managed code without all this unsafe pointer arithmetic nastiness but found out it was using too much memory. This is just a hobby project; the only one that gets hurt if it blows up is me.)

    0 讨论(0)
  • 2021-01-06 14:08

    Please double-check your unsafe code. Reading or writing memory outside the allocated block of memory causes that 'corruption'.

    0 讨论(0)
提交回复
热议问题