Regex Pattern to Match, Excluding when… / Except between

前端 未结 6 856
别跟我提以往
别跟我提以往 2020-11-21 05:07

--Edit-- The current answers have some useful ideas but I want something more complete that I can 100% understand and reuse; that\'s why I set a bounty. Als

6条回答
  •  北荒
    北荒 (楼主)
    2020-11-21 05:19

    Not sure if this would help you or not, but I am providing a solution considering the following assumptions -

    1. You need an elegant solution to check all the conditions
    2. Conditions can change in future and anytime.
    3. One condition should not depend on others.

    However I considered also the following -

    1. The file given has minimal errors in it. If it doe then my code might need some modifications to cope with that.
    2. I used Stack to keep track of if( blocks.

    Ok here is the solution -

    I used C# and with it MEF (Microsoft Extensibility Framework) to implement the configurable parsers. The idea is, use a single parser to parse and a list of configurable validator classes to validate the line and return true or false based on the validation. Then you can add or remove any validator anytime or add new ones if you like. So far I have already implemented for S1, S2 and S3 you mentioned, check classes at point 3. You have to add classes for s4, s5 if you need in future.

    1. First, Create the Interfaces -

      using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Text;
      using System.Threading.Tasks;
      
      namespace FileParserDemo.Contracts
      {
          public interface IParser
          {
              String[] GetMatchedLines(String filename);
          }
      
          public interface IPatternMatcher
          {
              Boolean IsMatched(String line, Stack stack);
          }
      }
      
    2. Then comes the file reader and checker -

      using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Text;
      using System.Threading.Tasks;
      using FileParserDemo.Contracts;
      using System.ComponentModel.Composition.Hosting;
      using System.ComponentModel.Composition;
      using System.IO;
      using System.Collections;
      
      namespace FileParserDemo.Parsers
      {
          public class Parser : IParser
          {
              [ImportMany]
              IEnumerable> parsers;
              private CompositionContainer _container;
      
              public void ComposeParts()
              {
                  var catalog = new AggregateCatalog();
                  catalog.Catalogs.Add(new AssemblyCatalog(typeof(IParser).Assembly));
                  _container = new CompositionContainer(catalog);
                  try
                  {
                      this._container.ComposeParts(this);
                  }
                  catch
                  {
      
                  }
              }
      
              public String[] GetMatchedLines(String filename)
              {
                  var matched = new List();
                  var stack = new Stack();
                  using (StreamReader sr = File.OpenText(filename))
                  {
                      String line = "";
                      while (!sr.EndOfStream)
                      {
                          line = sr.ReadLine();
                          var m = true;
                          foreach(var matcher in this.parsers){
                              m = m && matcher.Value.IsMatched(line, stack);
                          }
                          if (m)
                          {
                              matched.Add(line);
                          }
                       }
                  }
                  return matched.ToArray();
              }
          }
      }
      
    3. Then comes the implementation of individual checkers, the class names are self explanatory, so I don't think they need more descriptions.

      using FileParserDemo.Contracts;
      using System;
      using System.Collections.Generic;
      using System.ComponentModel.Composition;
      using System.Linq;
      using System.Text;
      using System.Text.RegularExpressions;
      using System.Threading.Tasks;
      
      namespace FileParserDemo.PatternMatchers
      {
          [Export(typeof(IPatternMatcher))]
          public class MatchAllNumbers : IPatternMatcher
          {
              public Boolean IsMatched(String line, Stack stack)
              {
                  var regex = new Regex("\\d+");
                  return regex.IsMatch(line);
              }
          }
      
          [Export(typeof(IPatternMatcher))]
          public class RemoveIfBlock : IPatternMatcher
          {
              public Boolean IsMatched(String line, Stack stack)
              {
                  var regex = new Regex("if\\(");
                  if (regex.IsMatch(line))
                  {
                      foreach (var m in regex.Matches(line))
                      {
                          //push the if
                          stack.Push(m.ToString());
                      }
                      //ignore current line, and will validate on next line with stack
                      return true;
                  }
                  regex = new Regex("//endif");
                  if (regex.IsMatch(line))
                  {
                      foreach (var m in regex.Matches(line))
                      {
                          stack.Pop();
                      }
                  }
                  return stack.Count == 0; //if stack has an item then ignoring this block
              }
          }
      
          [Export(typeof(IPatternMatcher))]
          public class RemoveWithEndPeriod : IPatternMatcher
          {
              public Boolean IsMatched(String line, Stack stack)
              {
                  var regex = new Regex("(?m)(?!\\d+.*?\\.$)\\d+");
                  return regex.IsMatch(line);
              }
          }
      
      
          [Export(typeof(IPatternMatcher))]
          public class RemoveWithInParenthesis : IPatternMatcher
          {
              public Boolean IsMatched(String line, Stack stack)
              {
                  var regex = new Regex("\\(.*\\d+.*\\)");
                  return !regex.IsMatch(line);
              }
          }
      }
      
    4. The program -

      using FileParserDemo.Contracts;
      using FileParserDemo.Parsers;
      using System;
      using System.Collections.Generic;
      using System.ComponentModel.Composition;
      using System.IO;
      using System.Linq;
      using System.Text;
      using System.Threading.Tasks;
      
      namespace FileParserDemo
      {
          class Program
          {
              static void Main(string[] args)
              {
                  var parser = new Parser();
                  parser.ComposeParts();
                  var matches = parser.GetMatchedLines(Path.GetFullPath("test.txt"));
                  foreach (var s in matches)
                  {
                      Console.WriteLine(s);
                  }
                  Console.ReadLine();
              }
          }
      }
      

    For testing I took @Tiago's sample file as Test.txt which had the following lines -

    this is a text
    it should match 12345
    if(
    it should not match 12345
    //endif 
    it should match 12345
    it should not match 12345.
    it should not match ( blabla 12345  blablabla )
    it should not match ( 12345 )
    it should match 12345
    

    Gives the output -

    it should match 12345
    it should match 12345
    it should match 12345
    

    Don't know if this would help you or not, I do had a fun time playing with it.... :)

    The best part with it is that, for adding a new condition all you have to do is provide an implementation of IPatternMatcher, it will automatically get called and thus will validate.

提交回复
热议问题