I understand the advantage of using RegexOptions.Compiled - it improves upon the execution time of app by having the regular expression in compiled form instead of interpre
Compilation generally only improves performance if you are saving the Regex object that you create. Since you are not, in your example, saving the Regex, you should not compile it.
You might want to restructure the code this way (note I re-wrote the regex to what I think you want. Having the start-of-line carat in a repeating group doesn't make a whole lot of sense, and I assume a name prefix ends with a dash):
private static readonly Regex CompiledRegex = new Regex("^[a-zA-Z]+-", RegexOptions.Compiled);
private static string GetNameCompiled(string objString)
{
return CompiledRegex.Replace(objString, "");
}
I wrote some test code for this also:
public static void TestSpeed()
{
var testData = "fooooo-bar";
var timer = new Stopwatch();
timer.Start();
for (var i = 0; i < 10000; i++)
Assert.AreEqual("bar", GetNameCompiled(testData));
timer.Stop();
Console.WriteLine("Compiled took " + timer.ElapsedMilliseconds + "ms");
timer.Reset();
timer.Start();
for (var i = 0; i < 10000; i++)
Assert.AreEqual("bar", GetName(testData));
timer.Stop();
Console.WriteLine("Uncompiled took " + timer.ElapsedMilliseconds + "ms");
timer.Reset();
}
private static readonly Regex CompiledRegex = new Regex("^[a-zA-Z]+-", RegexOptions.Compiled);
private static string GetNameCompiled(string objString)
{
return CompiledRegex.Replace(objString, "");
}
private static string GetName(string objString)
{
return Regex.Replace(objString, "^[a-zA-Z]+-", "");
}
On my machine, I get:
Compiled took 21ms
Uncompiled took 37ms
For any specific performance question like this, the best way to find out which way is faster is to test both and see.
In general, compiling a regex is unlikely to have much benefit unless you're using the regex a lot, or on very large strings. (Or both.) I think it's more of an optimization to try after you've determined that you have a performance problem and you think this might help, than one to try randomly.
For some general discussion on the drawbacks of RegexOptions.Compiled
, see this blog post by Jeff Atwood; it's very old, but from what I understand, none of the major relevant facts have changed since it was written.
Two things to think about are that RegexOptions.Compiled
takes up CPU time and memory.
With that in mind, there's basically just one instance when you should not use RegexOptions.Compiled :
There are too many variables to predict and draw a line in the sand, so to speak. It'd really require testing to determine the optimal approach. Or, if you don't feel like testing, then don't use Compiled
until you do.
Now, if you do choose RegexOptions.Compiled
it's important that you're not wasteful with it.
Often the best way to go about it is to define your object as a static variable that can be reused over and over. For example...
public static Regex NameRegex = new Regex(@"[^a-zA-Z&-]+", RegexOptions.Compiled);
The one problem with this approach is that if you're declaring this globally, then it may be a waste if your application doesn't always use it, or doesn't use it upon startup. So a slightly different approach would be to use lazy loading as I describe in the article I wrote yesterday.
So in this case it'd be something like this...
public static Lazy<Regex> NameRegex =
new Lazy<Regex>(() => new Regex("[^a-zA-Z&-]+", RegexOptions.Compiled));
Then you simply reference NameRegex.Value
whenever you want to use this regular expression and it's only instantiated when it's first accessed.
RegexOptions.Compiled in the Real World
On a couple of my sites, I'm using Regex routes for ASP.NET MVC. And this scenario is a perfect use for RegexOptions.Compiled
. The routes are defined when the web application starts up, and are then reused for all subsequent requests as long as the application keeps running. So these regular expressions are instantiated and compiled once and reused millions of times.
From a BCL blog post, compiling increases the startup time by an order of magnitude, but decreases subsequent runtimes by about 30%. Using these numbers, compilation should be considered for a pattern that you expect to be evaluated more than about 30 times. (Of course, like any performance optimization, both alternatives should be measured for acceptability.)
If performance is critical for a simple expression called repeatedly, you may want to avoid using regular expressions altogether. I tried running some variants about 5 million times each:
Note: edited from previous version to correct regular expression.
static string GetName1(string objString)
{
return Regex.Replace(objString, "[^a-zA-Z&-]+", "");
}
static string GetName2(string objString)
{
return Regex.Replace(objString, "[^a-zA-Z&-]+", "", RegexOptions.Compiled);
}
static string GetName3(string objString)
{
var sb = new StringBuilder(objString.Length);
foreach (char c in objString)
if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '-' || c == '&')
sb.Append(c);
return sb.ToString();
}
static string GetName4(string objString)
{
char[] c = objString.ToCharArray();
int pos = 0;
int writ = 0;
while (pos < c.Length)
{
char curr = c[pos];
if ((curr >= 'A' && curr <= 'Z') || (curr >= 'a' && curr <= 'z') || curr == '-' || curr == '&')
{
c[writ++] = c[pos];
}
pos++;
}
return new string(c, 0, writ);
}
unsafe static string GetName5(string objString)
{
char* buf = stackalloc char[objString.Length];
int writ = 0;
fixed (char* sp = objString)
{
char* pos = sp;
while (*pos != '\0')
{
char curr = *pos;
if ((curr >= 'A' && curr <= 'Z') ||
(curr >= 'a' && curr <= 'z') ||
curr == '-' || curr == '&')
buf[writ++] = curr;
pos++;
}
}
return new string(buf, 0, writ);
}
Executing independently for 5 million random ASCII strings, 30 characters each, consistently gave these numbers:
Method 1: 32.3 seconds (interpreted regex)
Method 2: 24.4 seconds (compiled regex)
Method 3: 1.82 seconds (StringBuilder concatenation)
Method 4: 1.64 seconds (char[] manipulation)
Method 5: 1.54 seconds (unsafe char* manipulation)
That is, compilation provided about a 25% performance benefit for a very large number of evaluations of this pattern, with the first execution being about 3 times slower. Methods that operated on the underlying character arrays were 12 times faster than the compiled regular expressions.
While method 4 or method 5 may provide some performance benefit over regular expressions, the other methods may provide other benefits (maintainability, readability, flexibility, etc.). This simple test does suggest that, in this case, compiling the regex has a modest performance benefit over interpreting it for a large number of evaluations.