I am very new in regex topic. I want to parse log files with following regex:
(?
Let me "convert" my comment into an answer since now I see what you can do about the regex performance.
As I have mentioned above, replace all .*?
with [^|]*
, and also all repeating [|][|][|]
with [|]{3}
(or similar, depending on the number of [|]
. Also, do not use nested capturing groups, that also influences performance!
var logFileFormat = @"(?<time>[^|]*)[|](?<placeholder4>[^|]*)[|](?<source>[^|]*)[|](?<level>[1-3])[|](?<message>[^|]*)[|]{3}(?<placeholder1>[^|]*)[|]{2}(?<placeholder2>[^|]*)[|](?<placeholder3>.*)";
Only the last .*
can remain "wildcardish" since it will grab the rest of the line.
Here is a comparison of your and my regex patterns at RegexHero.
Then, use RegexOptions.Compiled
:
Regex pattern = new Regex(LogFormat.GetLineRegex(logFileFormat), RegexOptions.Compiled);
If you are using the same regex multiple times, then make sure you compile it so that you are not recreating the regex each time. This can yield multiple orders of magnitude.
var regex = new Regex(".*", RegexOptions.Compiled);
The following LinqPad code shows 3 ways to use Regexes, from fastest to slowest.
The regexFast
method takes about 5 seconds, the regexSlow
method takes 6 seconds and the regexSlowest
takes about 50 seconds.
void Main()
{
var sw = new Stopwatch();
var regex = @"(?<first>T[he]{2})\s*\w{5}.*";
// This is the fastest method.
sw.Restart();
var regexFast = new Regex(regex, RegexOptions.Compiled);
for (int i = 0; i < 9999999; i++)
{
regexFast.Match("The quick brown fox");
}
sw.Stop();
sw.ElapsedMilliseconds.Dump();
// This is a little slower - we didn't compile the regex so it has
// to do some extra work on each iteration.
sw.Restart();
var regexSlow = new Regex(regex);
for (int i = 0; i < 9999999; i++)
{
regexSlow.Match("The quick brown fox");
}
sw.Stop();
sw.ElapsedMilliseconds.Dump();
// This method is super slow - we create a new Regex each time, so
// we have to do *lots* of extra work.
sw.Restart();
for (int i = 0; i < 9999999; i++)
{
var regexSlowest = new Regex(regex);
regexSlowest.Match("The quick brown fox");
}
sw.Stop();
sw.ElapsedMilliseconds.Dump();
}
Your regex can be optimized to:
(?<time>([^|]*))[|](?<placeholder4>([^|]*))[|](?<source>([^|]*))[|](?<level>[1-3])[|](?<message>([^|]*))[|]{3}(?<placeholder1>([^|]*))[|][|](?<placeholder2>([^|]*))[|](?<placeholder3>([^|]*))
using negated char class instead of lazy quantifiers. It reduce backtrack. Regex101 went from 316 steps to 47 with this change. Combine it with RB.'s answer and you should be fine