[[:>:]] or [[:<:]] don't match

后端 未结 2 585
别那么骄傲
别那么骄傲 2021-01-18 08:50

I\'m trying to use [[:>:]] in my regex but they are not accepted while other character classes e.g. [[:digit:]] or [[:word:]] are.

相关标签:
2条回答
  • 2021-01-18 09:36

    It is a bug, because these constructs (starting word boundary, [[:<:]], and ending [[:>:]] word boundary) are supported by the PCRE library itself:

    COMPATIBILITY FEATURE FOR WORD BOUNDARIES
    
      In  the POSIX.2 compliant library that was included in 4.4BSD Unix, the
      ugly syntax [[:<:]] and [[:>:]] is used for matching  "start  of  word"
      and "end of word". PCRE treats these items as follows:
    
        [[:<:]]  is converted to  \b(?=\w)
        [[:>:]]  is converted to  \b(?<=\w)
    
      Only these exact character sequences are recognized. A sequence such as
      [a[:<:]b] provokes error for an unrecognized  POSIX  class  name.  This
      support  is not compatible with Perl. It is provided to help migrations
      from other environments, and is best not used in any new patterns. Note
      that  \b matches at the start and the end of a word (see "Simple asser-
      tions" above), and in a Perl-style pattern the preceding  or  following
      character  normally  shows  which  is  wanted, without the need for the
      assertions that are used above in order to give exactly the  POSIX  be-
      haviour.
    

    When used in PHP code, it works:

    if (preg_match_all('/[[:<:]]home[[:>:]]/', 'homeless and home', $m))
    {
        print_r($m[0]); 
    }
    

    finds Array ( [0] => home). See the online PHP demo.

    So, it is the regex101.com developer team that decided (or forgot) to include support for these paired word boundaries.

    At regex101.com, instead, use \b word boundaries (both as starting and ending ones) that are supported by all 4 regex101.com regex engines: PCRE, JS, Python and Go.

    These word boundaries are mostly supported by POSIX-like engines, see this PostgreSQL regex demo, for example. The [[:<:]]HR[[:>:]] regex finds a match in Head of HR, but finds no match in <A HREF="some.html and CHROME.

    Other regex engines that support [[:<:]] and [[:>:]] word boundaries are base R (gsub with no perl=TRUE argument, e.g.) and MySQL.

    In Tcl regex, there is \m for [[:<:]] (starting word boundary) and \M for ending word boundary ([[:>:]]).

    0 讨论(0)
  • 2021-01-18 09:41

    You can use \b(?<=d) or \b(?=d) instead. In any case PCRE engine converts [[:<:]] to \b(?=\w) and [[:>:]] to \b(?<=\w) before starting the match.

    0 讨论(0)
提交回复
热议问题