Regex to parse functions with arbitrary depth

前端 未结 1 1991
故里飘歌
故里飘歌 2021-02-11 04:07

I\'m parsing a simple language (Excel formulas) for the functions contained within. A function name must start with any letter, followed by any number of letters/numbers, and e

相关标签:
1条回答
  • 2021-02-11 04:42

    This is well within the capabilities of .NET regexes. Here's a working demo:

    using System;
    using System.Text.RegularExpressions;
    
    namespace Test
    {
      class Test
      {
        public static void Main()
        {
          Regex r = new Regex(@"
            (?<name>[a-z][a-z0-9]*\()
              (?<body>
                (?>
                   \((?<DEPTH>)
                 |
                   \)(?<-DEPTH>)
                 |
                   [^()]+
                )*
                (?(DEPTH)(?!))
              )
            \)", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
    
          string formula = @"=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year((A$5+1)),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1";
    
          foreach (Match m in r.Matches(formula))
          {
            Console.WriteLine("{0}\n", m.Value);
          }
        }
      }
    }
    

    output:

    Date(Year(A$5),Month(A$5),1)
    
    Weekday(Date(Year((A$5+1)),Month(A$5),1))

    The main problem with your regex was that you were including the function name as part of the recursive match--for example:

    Name1(...Name2(...)...)
    

    Any open-paren that wasn't preceded by name was not counted, because it was matched by the final alternative, |.?), and that threw off the balance with the close-parens. That also meant that you couldn't match formulas like =MyFunc((1+1)), which you mentioned in the text but didn't include in the example. (I threw in an extra set of parens to demonstrate.)

    EDIT: Here's the version with support for non-significant, quoted parens:

      Regex r = new Regex(@"
        (?<name>[a-z][a-z0-9]*\()
          (?<body>
            (?>
               \((?<DEPTH>)
             |
               \)(?<-DEPTH>)
             |
               ""[^""]+""
             |
               [^()""]+
            )*
            (?(DEPTH)(?!))
          )
        \)", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
    
    0 讨论(0)
提交回复
热议问题