Bug with detection of unassigned local variables (when dynamic variables affect code flow prediction)

后端 未结 1 1474
礼貌的吻别
礼貌的吻别 2021-01-06 17:12

The Documentation implies that out parameters do not need to be initialized (only declared) before they are sent to the function. However, this code:

class P         


        
相关标签:
1条回答
  • 2021-01-06 17:40

    UPDATE: This question was the subject of my blog in November 2018. Thanks for the interesting question!

    The documentation implies that out parameters do not need to be initialized (only declared) before they are sent to the method.

    That's correct. Moreover, a variable passed to an out parameter is definitely assigned when the call returns, because as you note:

    Method T is required to set the variables before returning, so this error seems like hogwash to me

    Seems that way, doesn't it? Appearances can be deceiving!

    Note that even with a short-circuiting &&, the second expression has to execute in order for the "consequence" block of the if to execute.

    That is, surprisingly, false. There is a way for the consequence to execute even if the call to T does not execute. Doing so requires us to seriously abuse the rules of C#, but we can, so let's do it!

    Instead of

        dynamic p = "";
        string s;
        if (p != null && T(out s))
            System.Console.WriteLine(s);
    

    We'll do

        P p = new P();
        if (p != null && T())
            System.Console.WriteLine("in the consequence");
    

    and give a definition for class P that causes this program to run the consequence but not run the call to T.

    The first thing we have to do is turn p != null into a method call instead of a null check, and that method must not return bool:

    class P
    {
        public static C operator ==(P p1, P p2)
        {
            System.Console.WriteLine("P ==");
            return new C();
        }
        public static C operator !=(P p1, P p2)
        {
            System.Console.WriteLine("P !=");
            return new C();
        }
    }
    

    We are required to overload both == and != at the same time in C#. Overriding Equals and GetHashCode is a good idea but not a requirement, and nothing in this program is a good idea so we'll skip that.

    OK, so we now have if (something_of_type_C && T()), and since C is not bool, we'll need to override the && operator. But C# does not allow you to override the && operator directly. Let's digress a moment and talk about the semantics of &&. For Boolean-returning functions A and B, the semantics of bool result = A() && B(); are:

    bool a = A();
    bool c;
    if (a == false) // interesting operation
      c = a;
    else
    {
      bool b = B(); 
      c = a & b;    // interesting operation
    }
    bool r = c;
    

    So we generate three temporaries, a, b, and c, we evaluate the left side A(), we check to see if a is false. If it is, we use its value. If not, we compute B() and then compute a & b.

    The only two operations in that workflow that are specific to the type bool are check for falsity and non-short-circuiting &, so *those are the operations that are overloaded in a user-defined &&. C# requires you to overload three operations: user defined &, user defined "am I true?" and user defined "am I false?". (Like == and !=, the last two have to be defined in pairs.)

    Now, a sensible person would write operator true and operator false so that they always returned opposites. We are not sensible people today:

    class C
    {
        public static bool operator true(C c)
        {
            System.Console.WriteLine("C operator true");
            return true;
        }
    
        public static bool operator false(C c)
        {
            System.Console.WriteLine("C operator false");
            return true; // Oops
        }
    
        public static C operator &(C a, C b)
        {
            System.Console.WriteLine("C operator &");
            return a;
        }
    }
    

    Notice that we also require that user-defined & take two Cs and return a C, which it does.

    All right, so, recall we had

    if (p != null && T())
    

    and p != null is of type C. So we must now generate this as:

    C a = p != null; // Call to P.operator_!=
    C c;
    bool is_false = a is logically false; // call to C.operator_false
    if (is_false) 
      c = a;
    else
    {
      bool b = T();
      c = a & b; // Call to C.operator_&
    }
    

    But now we have a problem. operator & takes two Cs and returns a C, but we have a bool returned from T. We need a C. No problem, we'll add an implicit user-defined conversion to C from bool:

    public static implicit operator C(bool b)
    {
        System.Console.WriteLine("C implicit conversion from bool");
        return new C();
    }
    

    OK, so our logic is now:

    C a = p != null; // Call to P.operator_!=
    C c;
    bool is_false = C.operator_false(a);
    if (is_false)
      c = a;
    else
    {
      bool t = T(); 
      C b = t; // call to C.operator_implicit_C(bool)
      c = a & b; // Call to C.operator_&
    }
    

    Remember what we are heading towards here is:

    if (c)
      System.Console.WriteLine("in the consequence");
    

    How do we compute this? C# reasons that if you have operator true on C then you should be able to use it in an if condition by simply calling operator true. So finishing it off, ultimately we have the semantics:

    C a = p != null; // Call to P.operator_!=
    C c;
    bool is_false = C.operator_false(a);
    if (is_false)
      c = a;
    else
    {
      bool t = T(); 
      C b = t; // call to C.operator_implicit_C(bool)
      c = a & b; // Call to C.operator_&
    }
    bool is_true = C.operator_true(c);
    if (is_true) …
    

    But as we see in this crazy example, we can enter the consequence of the if without calling T no problem provided that operator false and operator true both return true. When we run the program we get:

    P !=
    C operator false
    C operator true
    in the consequence
    

    A sensible person would never write code where a C was considered to be both true and false at the same time, but a not-sensible person like me today could, and the compiler knows that because we designed the compiler to be correct regardless of whether the program is sensible.

    So that explains why if (p != null && T(out s)) says that s can be unassigned in the consequence. If p is dynamic then the compiler reasons "p might be one of these crazy types at runtime, in which case we are no longer working with bool operands, and therefore s might not be assigned".

    The moral of the story is: dynamic makes the compiler extremely conservative about what could happen; it has to assume the worst. In this particular case, it has to assume that p != null might not be a null reference check and might not be bool, and that operator true and operator false might both return true.

    So, is this a legitimate bug (I'm on C# 7.0)?

    The compiler's analysis is correct -- and believe me, this was not easy logic to write or test.

    Your code has the bug; fix it.

    How should I handle this?

    If you want to do a null reference check against a dynamic, your best bet is: if it hurts when you do that, don't do that.

    Cast away the dynamic and get back to object, and then do the reference equality check: if (((object)p) == null && …

    Or, another nice solution is to make it extremely explicit: if (object.ReferenceEquals((object)p, null) && …

    Those are my preferred solutions. A worse solution is to break it up:

    if (p != null)
      if (T(out string s))
         consequence
    

    Now there is no operator & called even in the worst case. Note though in this case we can still be in a scenario where p != null is true and p is null, since there is nothing stopping anyone from overloading != to always return true regardless of its operands.

    0 讨论(0)
提交回复
热议问题