问题
Possible Duplicate:
Parse CIL code with Regex
This question comes from Parse CIL code with Regex
To capture methods' body I added brackets ()
, it becomes
var regex3 = @"(\.method\s[^{]+({(?!\s*}).*?}))";
and it worked fine. For example, capture.Groups[2]
gives me
{
.entrypoint
//
.maxstack 8
IL_0000: nop
IL_0001: call void TestAssemblyConsole.Test::Method1()
IL_0006: nop
IL_0007: call int32 TestAssemblyConsole.Test::Method2()
IL_000c: pop
IL_000d: call string [mscorlib]System.Console::ReadLine()
IL_0012: pop
IL_0013: ret
}
and it's what I'm looking for. However if I have
.method public hidebysig static void Method1() cil managed
{
//
.maxstack 3
.locals init (class [mscorlib]System.Exception V_0)
IL_0000: nop
.try
{
.try
{
IL_0001: nop
IL_0002: ldstr "gfhgfhgfhg"
IL_0007: call void [mscorlib]System.Console::WriteLine(string)
IL_000c: nop
IL_000d: nop
IL_000e: leave.s IL_0020
} // end .try
catch [mscorlib]System.Exception
{
IL_0010: stloc.0
IL_0011: nop
IL_0012: ldstr "exception"
IL_0017: call void [mscorlib]System.Console::WriteLine(string)
IL_001c: nop
IL_001d: nop
IL_001e: leave.s IL_0020
} // end handler
IL_0020: nop
IL_0021: leave.s IL_0031
} // end .try
finally
{
IL_0023: nop
IL_0024: ldstr "finally"
IL_002f: nop
IL_0030: endfinally
} // end handler
IL_0031: nop
IL_0032: ret
}
then it does not working well. I just captures the part of method's body because of } .. }
within a method
{
//
.maxstack 1
.locals init (class [mscorlib]System.Exception V_0)
IL_0000: nop
.try
{
.try
{
IL_0001: nop
IL_0002: ldstr "gfhgfhgfhg"
IL_0007: call void [mscorlib]System.Console::WriteLine(string)
IL_000c: nop
IL_000d: nop
IL_000e: leave.s IL_0020
}
How do I change regex to be able to capture all method's body even when it contains many { .. } ?
回答1:
Basically Regexes are not the right tool for matching nested structures, however in your case you could use something like {.*} to match everything until the last } (Obviously that won't work with multiple methods.)
Write a CF Grammar parser yourself or use something like Antlr.
回答2:
This isn't something you can accomplish with a regex. To handle nested structures like this, you need to use a context free grammar parser.
In your case, you can probably get away with a simple scanner that counts the number of times you saw a {
and the number of times you saw a }
and then extract a method body whenever those counts are equal. But I'm if you're going to find other delimiters you need to worry about (or you're going to have to deal with comments), then this is going to get complicated fast, and a parser generator will be what you want.
回答3:
Using regex for parsing structure code is not recommended and it is bad practice
If your input is structured as shown in your question, try to use regex pattern
(\.method\s[^{]+?([\n\r]+\s*){(?!\s*}).*?\2})
Test it here.
来源:https://stackoverflow.com/questions/13004518/parse-code-with-regex-how-to-capture-methods-body-with-extra