How to use inline modifiers in C# regex?

后端 未结 2 797
心在旅途
心在旅途 2020-12-01 18:19

How do I use the inline modifiers instead of RegexOptions.Option?

For example:

Regex MyRegex = new Regex(@\"[a-z]+\", RegexOptions.Igno         


        
相关标签:
2条回答
  • 2020-12-01 18:48

    You can use inline modifiers as follows:

    // case insensitive match
    Regex MyRegex = new Regex(@"(?i)[a-z]+");  // case insensitive match
    

    or, inverse the meaning of the modifier by adding a minus-sign:

    // case sensitive match
    Regex MyRegex = new Regex(@"(?-i)[a-z]+");  // case sensitive match
    

    or, switch them on and off:

    // case sensitive, then case-insensitive match
    Regex MyRegex = new Regex(@"(?-i)[a-z]+(?i)[k-n]+");
    

    Alternatively, you can use the mode-modifier span syntax using a colon : and a grouping parenthesis, which scopes the modifier to only that group:

    // case sensitive, then case-insensitive match
    Regex MyRegex = new Regex(@"(?-i:[a-z]+)(?i:[k-n]+)");
    

    You can use multiple modifiers in one go like this (?is-m:text), or after another, if you find that clearer (?i)(?s)(?-m)text (I don't). When you use the on/off switching syntax, be aware that the modifier works till the next switch, or the end of the regex. Conversely, using the mode-modified spans, after the span the default behavior will apply.

    Finally: the allowed modifiers in .NET are (use a minus to invert the mode):

    x allow whitespace and comments
    s single-line mode
    m multi-line mode
    i case insensitivity
    n only allow explicit capture (.NET specific)

    0 讨论(0)
  • 2020-12-01 18:48

    Use it in this manner:

    Regex MyRegex = new Regex(@"(?i:[a-z]+)");
    

    Prefix the inline option to your pattern with (?<option>:<pattern>). In this case the option is "i" for IgnoreCase.

    By specifying a colon above you are setting the option to just that pattern. To make the option apply to the entire pattern you may set it in the beginning on its own:

    @"(?i)[a-z]+"
    

    It is also possible to use multiple options and turn them on and off:

    // On: IgnoreCase, ExplicitCapture. Off: IgnorePatternWhitespace
    @"(?in-x)[a-z]+"
    

    This allows for flexibility in a pattern to enable/disable options at different points of a regex that isn't possible when using the RegexOptions on the entire pattern.

    Here is a slightly in-depth example. I encourage you to play with it to understand when the options are taking effect.

    string input = "H2O (water) is named Dihydrogen Monoxide or Hydrogen Hydroxide. The H represents a hydrogen atom, and O is an Oxide atom.";
    
    // n = explicit captures
    // x = ignore pattern whitespace
    // -i = remove ignorecase option
    string pattern = @"di?(?nx-i) ( hydrogen ) | oxide";
    var matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase);
    Console.WriteLine("Total Matches: " + matches.Count);
    foreach (Match match in matches)
    {
        Console.WriteLine("Match: {0} - Groups: {1}", match.Value, match.Groups[1].Captures.Count);
    }
    
    Console.WriteLine();
    
    // n = explicit captures
    // x = ignore pattern whitespace
    // -i = remove ignorecase option
    // -x = remove ignore pattern whitespace
    pattern = @"di?(?nx-i) (?<H> hydrogen ) (?-x)|oxide";
    matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase);
    Console.WriteLine("Total Matches: " + matches.Count);
    foreach (Match match in matches)
    {
        Console.WriteLine("Match: {0} - Groups: {1}", match.Value, match.Groups["H"].Captures.Count);
    }
    

    The output for the above is:

    Total Matches: 3
    Match: Dihydrogen - Groups: 0
    Match: oxide - Groups: 0
    Match: oxide - Groups: 0
    
    Total Matches: 3
    Match: Dihydrogen - Groups: 1
    Match: oxide - Groups: 0
    Match: oxide - Groups: 0
    

    In both patterns RegexOptions.IgnoreCase is used which allows "di" to be case insensitive and thus match "Dihydrogen" (capital D). Since explicit capturing is on, the first example fails to have any groups for ( hydrogen ) since it doesn't use a named group, which is the requirement for explicit capturing. The second pattern does have 1 group since it uses (?<H> hydrogen ).

    Next, notice that the second pattern is modified to use (?-x)|oxide at the end. Since IgnorePatternWhitespace is disabled after the hydrogen capture, the remainder of the pattern must be correctly formed by not having additional whitespace (compare with the first pattern) until (?x) is turned on later in the pattern. This serves no real purpose but just shows an in-depth usage of inline options to demonstrate when they actually kick in.

    0 讨论(0)
提交回复
热议问题