How do I track down the cause of a StackOverflowException in .NET?

前端 未结 4 1204
南旧
南旧 2021-02-01 03:18

I get a StackOverflowException when I run the following code:

private void MyButton_Click(object sender, EventArgs e) {
  MyButton_Click_Aux();
}

p         


        
相关标签:
4条回答
  • 2021-02-01 03:26

    The bug is in your code. Presumably, MyButton_Click_Aux() causes some method to be re-entered. However, you've inexplicably omitted that code from your question and so no one can comment on it.

    0 讨论(0)
  • 2021-02-01 03:26

    Does Log call back to Log? This would also cause a SO.

    0 讨论(0)
  • 2021-02-01 03:46

    When that exception happens, why not check what was recorded in call stack panel? The call stack itself can tell a lot.

    Besides, low level debugging using SOS.dll and WinDbg can also tell you a lot.

    0 讨论(0)
  • 2021-02-01 03:51

    I know how to stop it from happening

    I just don't know why it causes it (yet). And it would appear you have indeed found a bug either in the .Net BCL or, more likely, in the JIT.

    I just commented out all the lines in the MyButton_Click_Aux method and then started bringing them back in, one by one.

    Take off the volatile from the static int and you'll no longer get a StackOverflowException.

    Now to research why... Clearly something to do with Memory Barriers is causing an issue - perhaps somehow forcing the MyButton_Click_Aux method to call itself...

    UPDATE

    Okay so other people are finding that .Net 3.5 is not an issue.

    I'm using .Nt 4 as well so these comments relate to that:

    As I said, take the volatile off and it works.

    Equally, if you put the volatile back on and remove the try/finally it also works:

    private static void MyButton_Click_Aux()
    {
      //try { /*remove because stack overflows without*/ }
      //finally
      //{
        var myLogData = new ArrayList(); 
        myLogData.Add(reportCount); 
        //myLogData.Add("method MyButtonClickAux");
        //Log(myLogData);
      //}
    }  
    

    I also wondered if it was something to do with the uninitialised reportCount when the try/finally is in. But it makes no difference if you initialise it to zero.

    I'm looking at the IL now - although it might require someone with some ASM chaps to get involved...

    Final Update As I say, this really is going to require analysis of the JIT output to really understand what's happening and whilst I find it fun to analyse assembler - I feel it's probably a job for someone in Microsoft so this bug can actually be confirmed and fixed! That said - it appears to be a pretty narrow set of circumstances.

    I've moved over to a release build to get rid of all the IL noise (nops etc) for analysis.

    This has, however, had a complicating impact on the diagnosis. I thought I had it but didn't - but now I know what it is.

    I tried this code:

    private static void MyButton_Click_Aux()
    {
      try { }
      finally
      {
        var myLogData = new ArrayList();
        Console.WriteLine(reportCount);
        //myLogData.Add("method MyButtonClickAux");
        //Log(myLogData);
      }
    }
    

    With the int as volatile. It runs without fault. Here's the IL:

    .maxstack 1
    L_0000: leave.s L_0015
    L_0002: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor()
    L_0007: pop 
    L_0008: volatile. 
    L_000a: ldsfld int32 modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) WindowsFormsApplication1.Form1::reportCount
    L_000f: call void [mscorlib]System.Console::WriteLine(int32)
    L_0014: endfinally 
    L_0015: ret 
    .try L_0000 to L_0002 finally handler L_0002 to L_0015
    

    Then we look at the minimum code required to get the error again:

    private static void MyButton_Click_Aux()
    {
      try { }
      finally
      {
        var myLogData = new ArrayList();
        myLogData.Add(reportCount);
      }
    }
    

    And it's IL:

    .maxstack 2
    .locals init (
        [0] class [mscorlib]System.Collections.ArrayList myLogData)
    L_0000: leave.s L_001c
    L_0002: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor()
    L_0007: stloc.0 
    L_0008: ldloc.0 
    L_0009: volatile. 
    L_000b: ldsfld int32 modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) WindowsFormsApplication1.Form1::reportCount
    L_0010: box int32
    L_0015: callvirt instance int32 [mscorlib]System.Collections.ArrayList::Add(object)
    L_001a: pop 
    L_001b: endfinally 
    L_001c: ret 
    .try L_0000 to L_0002 finally handler L_0002 to L_001c
    

    The difference? Well there's two that I spotted - boxing of the volatile int, and a virtual call. So I setup these two classes:

    public class DoesNothingBase
    {
      public void NonVirtualFooBox(object arg) { }
      public void NonVirtualFooNonBox(int arg) { }
    
      public virtual void FooBox(object arg) { }
      public virtual void FooNonBox(int arg) { }
    }
    
    public class DoesNothing : DoesNothingBase
    {
      public override void FooBox(object arg) { }
      public override void FooNonBox(int arg) { }
    }
    

    And then tried each of these four versions of the offending method:

    try { }
    finally
    {
      var doesNothing = new DoesNothing();
      doesNothing.FooNonBox(reportCount);
    }
    

    Which works.

    try { }
    finally
    {
      var doesNothing = new DoesNothing();
      doesNothing.NonVirtualFooNonBox(reportCount);
    }
    

    Which also works.

    try { }
    finally
    {
      var doesNothing = new DoesNothing();
      doesNothing.FooBox(reportCount);
    }
    

    Oops - StackOverflowException.

    And:

    try { }
    finally
    {
      var doesNothing = new DoesNothing();
      doesNothing.NonVirtualFooBox(reportCount);
    }
    

    Oops again! StackOverflowException!

    We could go further with this - but the issue is, I feel, clearly caused by the boxing of the volatile int whilst inside the finally block of a try/catch... I put the code inside the try, and no problem. I added a catch clause (and put the code in there), also no problem.

    It could also apply to the boxing of other value types I guess.

    So, to summarise - in .Net 4.0 - in both debug and release builds - the boxing of a volatile int in a finally block appears to cause the JIT to generate code that ends up filling the stack. The fact that the stack trace simply shows 'external code' also supports this proposition.

    There's even a possibility that it can't always be reproduced and might even depend on the layout and size of the code that is generated by the try/finally. It's clearly something to do with an errant jmp or something similar being generated to the wrong location which eventually repeats one or more push commands to the stack. The idea that that is being caused actually by a box operation is, frankly, fascinating!

    Final Final Update

    If you look at the MS Connect bug that @Hasty G found (answer further down) - you see there that the bug manifests in a similar fashion, but with a volatile bool in a catch statement.

    Also - MS queued a fix for this after getting it to repro - but no hotfix available yet after 7 months. I've gone on record before as being in support of MS Connect, so I'll say no more - I don't think I need to!

    Final Final Final Update (23/02/2011)

    It is fixed - but not yet released. Quote from the MS Team on the MS Connect bug:

    Yes, it's fixed. We're in the process of figuring out how best to ship a fix. It is already fixed in 4.5, but we'd really like to fix a batch of code generation bugs prior to 4.5 release.

    0 讨论(0)
提交回复
热议问题