I\'m looking for a neat RegEx solution to replace
Be aware, that \W
leaves the underscore. A short equivalent for [^a-zA-Z0-9]
would be [\W_]
text.replace(/[\W_]+/g," ");
\W
is the negation of shorthand \w
for [A-Za-z0-9_]
word characters (including the underscore)
Example at regex101.com
Jonny 5 beat me to it. I was going to suggest using the \W+
without the \s
as in text.replace(/\W+/g, " ")
. This covers white space as well.
For anyone still strugging (like me...) after the above more expert replies, this works in Visual Studio 2019:
outputString = Regex.Replace(inputString, @"\W", "_");
Remember to add
using System.Text.RegularExpressions;
This is an old post of mine, the accepted answers are good for the most part. However i decided to benchmark each solution and another obvious one (just for fun). I wondered if there was a difference between the regex patterns on different browsers with different sized strings.
So basically i used jsPerf on
The regex patterns i tested were
/[\W_]+/g
/[^a-z0-9]+/gi
/[^a-zA-Z0-9]+/g
I loaded them up with a string length of random characters
Example javascript i used var newstr = str.replace(/[\W_]+/g," ");
Each run consisted of 50 or more sample on each regex, and i run them 5 times on each browser.
Lets race our horses!
Results
Chrome Edge
Chars Pattern Ops/Sec Deviation Op/Sec Deviation
------------------------------------------------------------------------
5,000 /[\W_]+/g 19,977.80 1.09 10,820.40 1.32
5,000 /[^a-z0-9]+/gi 19,901.60 1.49 10,902.00 1.20
5,000 /[^a-zA-Z0-9]+/g 19,559.40 1.96 10,916.80 1.13
------------------------------------------------------------------------
1,000 /[\W_]+/g 96,239.00 1.65 52,358.80 1.41
1,000 /[^a-z0-9]+/gi 97,584.40 1.18 52,105.00 1.60
1,000 /[^a-zA-Z0-9]+/g 96,965.80 1.10 51,864.60 1.76
------------------------------------------------------------------------
200 /[\W_]+/g 480,318.60 1.70 261,030.40 1.80
200 /[^a-z0-9]+/gi 476,177.80 2.01 261,751.60 1.96
200 /[^a-zA-Z0-9]+/g 486,423.00 0.80 258,774.20 2.15
Truth be known, Regex in both browsers (taking into consideration deviation) were nearly indistinguishable, however i think if it run this even more times the results would become a little more clearer (but not by much).
Theoretical scaling for 1 character
Chrome Edge
Chars Pattern Ops/Sec Scaled Op/Sec Scaled
------------------------------------------------------------------------
5,000 /[\W_]+/g 19,977.80 99,889,000 10,820.40 54,102,000
5,000 /[^a-z0-9]+/gi 19,901.60 99,508,000 10,902.00 54,510,000
5,000 /[^a-zA-Z0-9]+/g 19,559.40 97,797,000 10,916.80 54,584,000
------------------------------------------------------------------------
1,000 /[\W_]+/g 96,239.00 96,239,000 52,358.80 52,358,800
1,000 /[^a-z0-9]+/gi 97,584.40 97,584,400 52,105.00 52,105,000
1,000 /[^a-zA-Z0-9]+/g 96,965.80 96,965,800 51,864.60 51,864,600
------------------------------------------------------------------------
200 /[\W_]+/g 480,318.60 96,063,720 261,030.40 52,206,080
200 /[^a-z0-9]+/gi 476,177.80 95,235,560 261,751.60 52,350,320
200 /[^a-zA-Z0-9]+/g 486,423.00 97,284,600 258,774.20 51,754,840
I wouldn't take to much into these results as this is not really a significant differences, all we can really tell is edge is slower :o . Additionally that i was super bored.
Anyway you can run the benchmark for your self.
Jsperf Benchmark here
To replace with dashes, do the following:
text.replace(/[\W_-]/g,' ');
Since [^a-z0-9]
character class contains all that is not alnum, it contains white characters too!
text.replace(/[^a-z0-9]+/gi, " ");