C# Regex Performance very slow

前端 未结 3 496
醉梦人生
醉梦人生 2020-12-29 08:53

I am very new in regex topic. I want to parse log files with following regex:

(?
相关标签:
3条回答
  • 2020-12-29 09:30

    Let me "convert" my comment into an answer since now I see what you can do about the regex performance.

    As I have mentioned above, replace all .*? with [^|]*, and also all repeating [|][|][|] with [|]{3} (or similar, depending on the number of [|]. Also, do not use nested capturing groups, that also influences performance!

    var logFileFormat = @"(?<time>[^|]*)[|](?<placeholder4>[^|]*)[|](?<source>[^|]*)[|](?<level>[1-3])[|](?<message>[^|]*)[|]{3}(?<placeholder1>[^|]*)[|]{2}(?<placeholder2>[^|]*)[|](?<placeholder3>.*)";
    

    Only the last .* can remain "wildcardish" since it will grab the rest of the line.

    Here is a comparison of your and my regex patterns at RegexHero.

    Then, use RegexOptions.Compiled:

    Regex pattern = new Regex(LogFormat.GetLineRegex(logFileFormat), RegexOptions.Compiled);
    
    0 讨论(0)
  • 2020-12-29 09:41

    If you are using the same regex multiple times, then make sure you compile it so that you are not recreating the regex each time. This can yield multiple orders of magnitude.

    var regex = new Regex(".*", RegexOptions.Compiled);
    

    The following LinqPad code shows 3 ways to use Regexes, from fastest to slowest.

    The regexFast method takes about 5 seconds, the regexSlow method takes 6 seconds and the regexSlowest takes about 50 seconds.

    void Main()
    {
        var sw = new Stopwatch();
    
        var regex = @"(?<first>T[he]{2})\s*\w{5}.*";
    
        // This is the fastest method.
        sw.Restart();
        var regexFast = new Regex(regex, RegexOptions.Compiled);
        for (int i = 0; i < 9999999; i++)
        {
            regexFast.Match("The quick brown fox");
        }
        sw.Stop();
        sw.ElapsedMilliseconds.Dump();
    
        // This is a little slower - we didn't compile the regex so it has 
        // to do some extra work on each iteration.
        sw.Restart();
        var regexSlow = new Regex(regex);
        for (int i = 0; i < 9999999; i++)
        {
            regexSlow.Match("The quick brown fox");
        }
        sw.Stop();
        sw.ElapsedMilliseconds.Dump();
    
        // This method is super slow - we create a new Regex each time, so 
        // we have to do *lots* of extra work.
        sw.Restart();
        for (int i = 0; i < 9999999; i++)
        {
            var regexSlowest = new Regex(regex);
            regexSlowest.Match("The quick brown fox");
        }
        sw.Stop();
        sw.ElapsedMilliseconds.Dump();
    }
    
    0 讨论(0)
  • 2020-12-29 09:43

    Your regex can be optimized to:

    (?<time>([^|]*))[|](?<placeholder4>([^|]*))[|](?<source>([^|]*))[|](?<level>[1-3])[|](?<message>([^|]*))[|]{3}(?<placeholder1>([^|]*))[|][|](?<placeholder2>([^|]*))[|](?<placeholder3>([^|]*))
    

    using negated char class instead of lazy quantifiers. It reduce backtrack. Regex101 went from 316 steps to 47 with this change. Combine it with RB.'s answer and you should be fine

    0 讨论(0)
提交回复
热议问题