Parse code with Regex - how to capture method's body with extra {… }? [duplicate]

This question comes from Parse CIL code with Regex To capture methods' body I added brackets (), it becomes

var regex3 = @"(\.method\s[^{]+({(?!\s*}).*?}))";

and it worked fine. For example, capture.Groups[2] gives me

    .maxstack  8
    IL_0000:  nop
    IL_0001:  call       void TestAssemblyConsole.Test::Method1()
    IL_0006:  nop
    IL_0007:  call       int32 TestAssemblyConsole.Test::Method2()
    IL_000c:  pop
    IL_000d:  call       string [mscorlib]System.Console::ReadLine()
    IL_0012:  pop
    IL_0013:  ret

and it's what I'm looking for. However if I have

.method public hidebysig static void  Method1() cil managed
    .maxstack 3
    .locals init (class [mscorlib]System.Exception V_0)
    IL_0000:  nop
        IL_0001:  nop
        IL_0002:  ldstr      "gfhgfhgfhg"
        IL_0007:  call       void [mscorlib]System.Console::WriteLine(string)
        IL_000c:  nop
        IL_000d:  nop
        IL_000e:  leave.s    IL_0020

      }  // end .try
      catch [mscorlib]System.Exception 
        IL_0010:  stloc.0
        IL_0011:  nop
        IL_0012:  ldstr      "exception"
        IL_0017:  call       void [mscorlib]System.Console::WriteLine(string)
        IL_001c:  nop
        IL_001d:  nop
        IL_001e:  leave.s    IL_0020

      }  // end handler
      IL_0020:  nop
      IL_0021:  leave.s    IL_0031

    }  // end .try
      IL_0023:  nop
      IL_0024:  ldstr      "finally"
      IL_002f:  nop
      IL_0030:  endfinally
    }  // end handler
    IL_0031:  nop
    IL_0032:  ret

then it does not working well. I just captures the part of method's body because of } .. } within a method

    .maxstack  1
    .locals init (class [mscorlib]System.Exception V_0)
    IL_0000:  nop
        IL_0001:  nop
        IL_0002:  ldstr      "gfhgfhgfhg"
        IL_0007:  call       void [mscorlib]System.Console::WriteLine(string)
        IL_000c:  nop
        IL_000d:  nop
        IL_000e:  leave.s    IL_0020


How do I change regex to be able to capture all method's body even when it contains many { .. } ?


Basically Regexes are not the right tool for matching nested structures, however in your case you could use something like {.*} to match everything until the last } (Obviously that won't work with multiple methods.)

Write a CF Grammar parser yourself or use something like Antlr.


This isn't something you can accomplish with a regex. To handle nested structures like this, you need to use a context free grammar parser.

In your case, you can probably get away with a simple scanner that counts the number of times you saw a { and the number of times you saw a } and then extract a method body whenever those counts are equal. But I'm if you're going to find other delimiters you need to worry about (or you're going to have to deal with comments), then this is going to get complicated fast, and a parser generator will be what you want.


Using regex for parsing structure code is not recommended and it is bad practice

If your input is structured as shown in your question, try to use regex pattern


