How can I match strings that don't match a particular pattern in Perl?

前端 未结 6 1544
别跟我提以往
别跟我提以往 2021-02-06 00:25

I know that it is easy to match anything except a given character using a regular expression.

$text = \"ab ac ad\";
$text =~ s/[^c]*//g; # Match anything, except         


        
6条回答
  •  臣服心动
    2021-02-06 00:37

    Update: In a comment on your question, you mentioned you want to clean wiki markup and remove balanced sequences of {{ ... }}. Section 6 of the Perl FAQ covers this: Can I use Perl regular expressions to match balanced text?

    Consider the following program:

    #! /usr/bin/perl
    
    use warnings;
    use strict;
    
    use Text::Balanced qw/ extract_tagged /;
    
    # for demo only
    *ARGV = *DATA;
    
    while (<>) {
      if (s/^(.+?)(?=\{\{)//) {
        print $1;
        my(undef,$after) = extract_tagged $_, "{{" => "}}";
    
        if (defined $after) {
          $_ = $after;
          redo;
        }
      }
    
      print;
    }
    
    __DATA__
    Lorem ipsum dolor sit amet, consectetur
    adipiscing elit. {{delete me}} Sed quis
    nulla ut dolor {{me too}} fringilla
    mollis {{ quis {{ ac }} erat.
    

    Its output:

    Lorem ipsum dolor sit amet, consectetur
    adipiscing elit.  Sed quis
    nulla ut dolor  fringilla
    mollis {{ quis  erat.

    For your particular example, you could use

    $text =~ s/[^ac]|a(?!c)|(?

    That is, only delete an a or c when they aren't part of an ac sequence.

    In general, this is tricky to do with a regular expression.

    Say you don't want foo followed by optional whitespace and then bar in $str. Often, it's clearer and easier to check separately. For example:

    die "invalid string ($str)"
      if $str =~ /^.*foo\s*bar/;
    

    You might also be interested in an answer to a similar question, where I wrote

    my $nofoo = qr/
      (      [^f] |
        f  (?! o) |
        fo (?! o  \s* bar)
      )*
    /x;
    
    my $pattern = qr/^ $nofoo bar /x;
    

    To understand the complication, read How Regexes Work by Mark Dominus. The engine compiles regular expressions into state machines. When it's time to match, it feeds the input string to the state machine and checks whether the state machine finishes in an accept state. So to exclude a string, you have to specify a machine that accepts all inputs except a particular sequence.

    What might help is a /v regular expression switch that creates the state machine as usual but then complements the accept-state bit for all states. It's hard to say whether this would really be useful as compared with separate checks because a /v regular expression may still surprise people, just in different ways.

    If you're interested in the theoretical details, see An Introduction to Formal Languages and Automata by Peter Linz.

提交回复
热议问题