Regex match entire words only

后端 未结 7 1341
挽巷
挽巷 2020-11-21 05:35

I have a regex expression that I\'m using to find all the words in a given block of content, case insensitive, that are contained in a glossary stored in a database. Here\'s

7条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-21 06:09

    Using \b can yield surprising results. You would be better off figuring out what separates a word from its definition and incorporating that information into your pattern.

    #!/usr/bin/perl
    
    use strict; use warnings;
    
    use re 'debug';
    
    my $str = 'S.P.E.C.T.R.E. (Special Executive for Counter-intelligence,
    Terrorism, Revenge and Extortion) is a fictional global terrorist
    organisation';
    
    my $word = 'S.P.E.C.T.R.E.';
    
    if ( $str =~ /\b(\Q$word\E)\b/ ) {
        print $1, "\n";
    }
    

    Output:

    Compiling REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
    Final program:
       1: BOUND (2)
       2: OPEN1 (4)
       4:   EXACT  (9)
       9: CLOSE1 (11)
      11: BOUND (12)
      12: END (0)
    anchored "S.P.E.C.T.R.E." at 0 (checking anchored) stclass BOUND minlen 14
    Guessing start of match in sv for REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P
    .E.C.T.R.E. (Special Executive for Counter-intelligence,"...
    Found anchored substr "S.P.E.C.T.R.E." at offset 0...
    start_shift: 0 check_at: 0 s: 0 endpos: 1
    Does not contradict STCLASS...
    Guessed: match at offset 0
    Matching REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P.E.C.T.R.E. (Special Exec
    utive for Counter-intelligence,"...
       0           |  1:BOUND(2)
       0           |  2:OPEN1(4)
       0           |  4:EXACT (9)
      14      |  9:CLOSE1(11)
      14      | 11:BOUND(12)
                                      failed...
    Match failed
    Freeing REx: "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
    

提交回复
热议问题