Dynamic vs Inline RegExp performance in JavaScript

后端 未结 3 1013
鱼传尺愫
鱼传尺愫 2020-12-25 11:03

I stumbled upon that performance test, saying that RegExps in JavaScript are not necessarily slow: http://jsperf.com/regexp-indexof-perf

There\'s one thing i didn\'t

相关标签:
3条回答
  • 2020-12-25 11:15

    in the second case, the regular expression object is created during the parsing of the language, and in the first case, the RegExp class constructor has to parse an arbitrary string.

    0 讨论(0)
  • 2020-12-25 11:20

    The difference in performance is not related to the syntax that is used is partly related to the syntax that is used: in /pattern/ and RegExp(/pattern/) (where you did not test the latter) the regular expression is only compiled once, but for RegExp('pattern') the expression is compiled on each usage. See Alexander's answer, which should be the accepted answer today.

    Apart from the above, in your tests for inlineRegExp and storedRegExp you're looking at code that is initialized once when the source code text is parsed, while for dynamicRegExp the regular expression is created for each invocation of the method. Note that the actual tests run things like r = dynamicRegExp(element) many times, while the preparation code is only run once.

    The following gives you about the same results, according to another jsPerf:

    var reContains = /(?:^| )foo(?: |$)/;
    

    ...and

    var reContains = RegExp('(?:^| )foo(?: |$)'); 
    

    ...when both are used with

    function storedRegExp(node) {
      return reContains.test(node.className);
    }
    

    Sure, the source code of RegExp('(?:^| )foo(?: |$)') might first be parsed into a String, and then into a RegExp, but I doubt that by itself will be twice as slow. However, the following will create a new RegExp(..) again and again for each method call:

    function dynamicRegExp(node) {
      return RegExp('(?:^| )foo(?: |$)').test(node.className);
    }
    

    If in the original test you'd only call each method once, then the inline version would not be a whopping 2 times faster.

    (I am more surprised that inlineRegExp and storedRegExp have different results. This is explained in Alexander's answer too.)

    0 讨论(0)
  • 2020-12-25 11:26

    Nowadays, answers given here are not entirely complete/correct.

    Starting from ES5, the literal syntax behavior is the same as RegExp() syntax regarding object creation: both of them creates a new RegExp object every time code path hits an expression in which they are taking part.

    Therefore, the only difference between them now is how often that regexp is compiled:

    • With literal syntax - one time during initial code parsing and compiling
    • With RegExp() syntax - every time new object gets created

    See, for instance, Stoyan Stefanov's JavaScript Patterns book:

    Another distinction between the regular expression literal and the constructor is that the literal creates an object only once during parse time. If you create the same regular expression in a loop, the previously created object will be returned with all its properties (such as lastIndex) already set from the first time. Consider the following example as an illustration of how the same object is returned twice.

    function getRE() {
        var re = /[a-z]/;
        re.foo = "bar";
        return re;
    }
    
    var reg = getRE(),
        re2 = getRE();
    
    console.log(reg === re2); // true
    reg.foo = "baz";
    console.log(re2.foo); // "baz"
    

    This behavior has changed in ES5 and the literal also creates new objects. The behavior has also been corrected in many browser environments, so it’s not to be relied on.

    If you run this sample in all modern browsers or NodeJS, you get the following instead:

    false
    bar
    

    Meaning that every time you're calling the getRE() function, a new RegExp object is created even with literal syntax approach.

    The above not only explains why you shouldn't use the RegExp() for immutable regexps (it's very well known performance issue today), but also explains:

    (I am more surprised that inlineRegExp and storedRegExp have different results.)

    The storedRegExp is about 5 - 20% percent faster across browsers than inlineRegExp because there is no overhead of creating (and garbage collecting) a new RegExp object every time.

    Conclusion:
    Always create your immutable regexps with literal syntax and cache it if it's to be re-used. In other words, don't rely on that difference in behavior in envs below ES5, and continue caching appropriately in envs above.

    Why literal syntax? It has some advantages comparing to constructor syntax:

    1. It is shorter and doesn’t force you to think in terms of class-like constructors.
    2. When using the RegExp() constructor, you also need to escape quotes and double-escape backslashes. It makes regular expressions that are hard to read and understand by their nature even more harder.

    (Free citation from the same Stoyan Stefanov's JavaScript Patterns book).
    Hence, it's always a good idea to stick with the literal syntax, unless your regexp isn't known at the compile time.

    0 讨论(0)
提交回复
热议问题