Replace all non Alpha Numeric characters, New Lines, and multiple White Space with one Space

后端 未结 8 1535
终归单人心
终归单人心 2021-01-29 20:24

I\'m looking for a neat RegEx solution to replace

  • All non Alpha-Numeric Characters
  • All NewLines
  • All multiple instances of white space
相关标签:
8条回答
  • 2021-01-29 20:38

    Be aware, that \W leaves the underscore. A short equivalent for [^a-zA-Z0-9] would be [\W_]

    text.replace(/[\W_]+/g," ");
    

    \W is the negation of shorthand \w for [A-Za-z0-9_] word characters (including the underscore)

    Example at regex101.com

    0 讨论(0)
  • 2021-01-29 20:41

    Jonny 5 beat me to it. I was going to suggest using the \W+ without the \s as in text.replace(/\W+/g, " "). This covers white space as well.

    0 讨论(0)
  • 2021-01-29 20:47

    For anyone still strugging (like me...) after the above more expert replies, this works in Visual Studio 2019:

    outputString = Regex.Replace(inputString, @"\W", "_");
    

    Remember to add

    using System.Text.RegularExpressions;
    
    0 讨论(0)
  • 2021-01-29 20:53

    This is an old post of mine, the accepted answers are good for the most part. However i decided to benchmark each solution and another obvious one (just for fun). I wondered if there was a difference between the regex patterns on different browsers with different sized strings.

    So basically i used jsPerf on

    • Testing in Chrome 65.0.3325 / Windows 10 0.0.0
    • Testing in Edge 16.16299.0 / Windows 10 0.0.0

    The regex patterns i tested were

    • /[\W_]+/g
    • /[^a-z0-9]+/gi
    • /[^a-zA-Z0-9]+/g

    I loaded them up with a string length of random characters

    • length 5000
    • length 1000
    • length 200

    Example javascript i used var newstr = str.replace(/[\W_]+/g," ");

    Each run consisted of 50 or more sample on each regex, and i run them 5 times on each browser.

    Lets race our horses!

    Results

                                    Chrome                  Edge
    Chars   Pattern                 Ops/Sec     Deviation   Op/Sec      Deviation
    ------------------------------------------------------------------------
    5,000   /[\W_]+/g                19,977.80  1.09         10,820.40  1.32
    5,000   /[^a-z0-9]+/gi           19,901.60  1.49         10,902.00  1.20
    5,000   /[^a-zA-Z0-9]+/g         19,559.40  1.96         10,916.80  1.13
    ------------------------------------------------------------------------
    1,000   /[\W_]+/g                96,239.00  1.65         52,358.80  1.41
    1,000   /[^a-z0-9]+/gi           97,584.40  1.18         52,105.00  1.60
    1,000   /[^a-zA-Z0-9]+/g         96,965.80  1.10         51,864.60  1.76
    ------------------------------------------------------------------------
      200   /[\W_]+/g               480,318.60  1.70        261,030.40  1.80
      200   /[^a-z0-9]+/gi          476,177.80  2.01        261,751.60  1.96
      200   /[^a-zA-Z0-9]+/g        486,423.00  0.80        258,774.20  2.15
    

    Truth be known, Regex in both browsers (taking into consideration deviation) were nearly indistinguishable, however i think if it run this even more times the results would become a little more clearer (but not by much).

    Theoretical scaling for 1 character

                                Chrome                        Edge
    Chars   Pattern             Ops/Sec     Scaled            Op/Sec    Scaled
    ------------------------------------------------------------------------
    5,000   /[\W_]+/g            19,977.80  99,889,000       10,820.40  54,102,000
    5,000   /[^a-z0-9]+/gi       19,901.60  99,508,000       10,902.00  54,510,000
    5,000   /[^a-zA-Z0-9]+/g     19,559.40  97,797,000       10,916.80  54,584,000
    ------------------------------------------------------------------------
    
    1,000   /[\W_]+/g            96,239.00  96,239,000       52,358.80  52,358,800
    1,000   /[^a-z0-9]+/gi       97,584.40  97,584,400       52,105.00  52,105,000
    1,000   /[^a-zA-Z0-9]+/g     96,965.80  96,965,800       51,864.60  51,864,600
    ------------------------------------------------------------------------
    
      200   /[\W_]+/g           480,318.60  96,063,720      261,030.40  52,206,080
      200   /[^a-z0-9]+/gi      476,177.80  95,235,560      261,751.60  52,350,320
      200   /[^a-zA-Z0-9]+/g    486,423.00  97,284,600      258,774.20  51,754,840
    

    I wouldn't take to much into these results as this is not really a significant differences, all we can really tell is edge is slower :o . Additionally that i was super bored.

    Anyway you can run the benchmark for your self.

    Jsperf Benchmark here

    0 讨论(0)
  • 2021-01-29 20:55

    To replace with dashes, do the following:

    text.replace(/[\W_-]/g,' ');
    
    0 讨论(0)
  • 2021-01-29 20:56

    Since [^a-z0-9] character class contains all that is not alnum, it contains white characters too!

     text.replace(/[^a-z0-9]+/gi, " ");
    
    0 讨论(0)
提交回复
热议问题