Does the Python regular expression module use BRE or ERE?

前端 未结 2 1155
醉梦人生
醉梦人生 2021-02-14 11:25

It appears that POSIX splits regular expression implementations into two kinds: Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE).

Python re module

2条回答
  •  滥情空心
    2021-02-14 12:12

    Neither. It's basically the PCRE dialect, but a distinct implementation.

    The very first sentence in the re documentation says:

    This module provides regular expression matching operations similar to those found in Perl.

    While this does not immediately reveal to a newcomer how they are related to e.g. POSIX regular expressions, it should be common knowledge that Perl 4 and later Perl 5 provided a substantially expanded feature set over the regex features of earlier tools, including what POSIX mandated for grep -E aka ERE.

    The perlre manual page describes the regular expression features in more detail, though you'll find much the same details in a different form in the Python documentation. The Perl manual page contains this bit of history:

    The patterns used in Perl pattern matching evolved from those supplied in the Version 8 regex routines. (The routines are derived (distantly) from Henry Spencer's freely redistributable reimplementation of the V8 routines.)

    (Here, V8 refers to Version 8 Unix. Spencer's library basically (re)implemented POSIX regular expressions.)

    Perl 4 had a large number of convenience constructs like \d, \s, \w as well as symbolic shorthands like \t, \f, \n. Perl 5 added a significant set of extensions (which is still growing slowly) including, but not limited to,

    • Non-greedy quantifiers
    • Non-backtracking quantifiers
    • Unicode symbol and property support
    • Non-grouping parentheses
    • Lookaheads and lookbehinds
    • ... Basically anything that starts with (?

    As a result, the "regular" expressions are by no means strictly "regular" any longer.

    This was reimplemented in a portable library by Philip Hazell, originally for the Exim mail server; his PCRE library has found its way into myriad different applications, including a number of programming languages (Ruby, PHP, Python, etc). Incidentally, in spite of the name, the library is not strictly "Perl compatible" (any longer); there are differences in features as well as in behavior. (For example, Perl internally changes * to something like {0,32767} while PCRE does something else.)

    An earlier version of Python actually had a different regex implementation, and there are plans to change it again (though it will remain basically PCRE). This is the situation as of Python 2.7 / 3.5.

提交回复
热议问题