Does the regular expression engine skip over strings that are shorter than the pattern?

前端 未结 1 1440
谎友^
谎友^ 2021-01-15 14:51

I want to loop through a set of strings. On each string i want to loop through a set of regular expression to determine which expressions match on the string I\'m on. Howe

相关标签:
1条回答
  • 2021-01-15 15:10

    When you're after implementation details, and when the source code is available, the best way to tell is to simply look at it. :)

    The short answer is: not exactly.

    The optimization implemented in the .NET regex implementation is a Boyer-Moore string search as the first phase of matching when possible. Take a look at the source code for the gory details.

    From the code itself:

    // The RegexBoyerMoore object precomputes the Boyer-Moore
    // tables for fast string scanning. These tables allow
    // you to scan for the first occurance of a string within
    // a large body of text without examining every character.
    // The performance of the heuristic depends on the actual
    // string and the text being searched, but usually, the longer
    // the string that is being searched for, the fewer characters
    // need to be examined.
    

    This requires an anchoring prefix, which is searched for by this function, whose comment says:

    /*
     * This is the one of the only two functions that should be called from outside.
     * It takes a RegexTree and computes the set of chars that can start it.
     */
    

    The matching algorithm contains code which returns a no match result immediately if the input string is shorter than the computed prefix.

    Note that it's also looking for anchors and optimizing for these, of course.

    I did not find a minimum length optimization in the code, but I admit I didn't read it thoroughly (gotta do that one day). But I know other regex implementations which do this kind of optimization (PCRE comes to mind). Anyway, the .NET implementation has its own way of optimizing things, you should rely on that.

    0 讨论(0)
提交回复
热议问题