Parse code with Regex - how to capture method's body with extra {… }? [duplicate]

这一生的挚爱 提交于 2019-12-12 06:43:45

问题


Possible Duplicate:
Parse CIL code with Regex

This question comes from Parse CIL code with Regex To capture methods' body I added brackets (), it becomes

var regex3 = @"(\.method\s[^{]+({(?!\s*}).*?}))";

and it worked fine. For example, capture.Groups[2] gives me

{
    .entrypoint
    // 
    .maxstack  8
    IL_0000:  nop
    IL_0001:  call       void TestAssemblyConsole.Test::Method1()
    IL_0006:  nop
    IL_0007:  call       int32 TestAssemblyConsole.Test::Method2()
    IL_000c:  pop
    IL_000d:  call       string [mscorlib]System.Console::ReadLine()
    IL_0012:  pop
    IL_0013:  ret
  }

and it's what I'm looking for. However if I have

.method public hidebysig static void  Method1() cil managed
  {
    // 
    .maxstack 3
    .locals init (class [mscorlib]System.Exception V_0)
    IL_0000:  nop
    .try
    {
      .try
      {
        IL_0001:  nop
        IL_0002:  ldstr      "gfhgfhgfhg"
        IL_0007:  call       void [mscorlib]System.Console::WriteLine(string)
        IL_000c:  nop
        IL_000d:  nop
        IL_000e:  leave.s    IL_0020

      }  // end .try
      catch [mscorlib]System.Exception 
      {
        IL_0010:  stloc.0
        IL_0011:  nop
        IL_0012:  ldstr      "exception"
        IL_0017:  call       void [mscorlib]System.Console::WriteLine(string)
        IL_001c:  nop
        IL_001d:  nop
        IL_001e:  leave.s    IL_0020

      }  // end handler
      IL_0020:  nop
      IL_0021:  leave.s    IL_0031

    }  // end .try
    finally
    {
      IL_0023:  nop
      IL_0024:  ldstr      "finally"
      IL_002f:  nop
      IL_0030:  endfinally
    }  // end handler
    IL_0031:  nop
    IL_0032:  ret
  } 

then it does not working well. I just captures the part of method's body because of } .. } within a method

{
    // 
    .maxstack  1
    .locals init (class [mscorlib]System.Exception V_0)
    IL_0000:  nop
    .try
    {
      .try
      {
        IL_0001:  nop
        IL_0002:  ldstr      "gfhgfhgfhg"
        IL_0007:  call       void [mscorlib]System.Console::WriteLine(string)
        IL_000c:  nop
        IL_000d:  nop
        IL_000e:  leave.s    IL_0020

      }

How do I change regex to be able to capture all method's body even when it contains many { .. } ?


回答1:


Basically Regexes are not the right tool for matching nested structures, however in your case you could use something like {.*} to match everything until the last } (Obviously that won't work with multiple methods.)

Write a CF Grammar parser yourself or use something like Antlr.




回答2:


This isn't something you can accomplish with a regex. To handle nested structures like this, you need to use a context free grammar parser.

In your case, you can probably get away with a simple scanner that counts the number of times you saw a { and the number of times you saw a } and then extract a method body whenever those counts are equal. But I'm if you're going to find other delimiters you need to worry about (or you're going to have to deal with comments), then this is going to get complicated fast, and a parser generator will be what you want.




回答3:


Using regex for parsing structure code is not recommended and it is bad practice

If your input is structured as shown in your question, try to use regex pattern

(\.method\s[^{]+?([\n\r]+\s*){(?!\s*}).*?\2})

Test it here.



来源:https://stackoverflow.com/questions/13004518/parse-code-with-regex-how-to-capture-methods-body-with-extra

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!