How can regex ignore escaped-quotes when matching strings?

做~自己de王妃 提交于 2019-12-01 05:36:35
<?php
$backslash = '\\';

$pattern = <<< PATTERN
#(["'])(?:{$backslash}{$backslash}?+.)*?{$backslash}1#
PATTERN;

foreach(array(
    "<?php \$s = 'Hi everyone, we\\'re ready now.'; ?>",
    '<?php $s = "Hi everyone, we\\"re ready now."; ?>',
    "xyz'a\\'bc\\d'123",
    "x = 'My string ends with with a backslash\\\\';"
    ) as $subject) {
        preg_match($pattern, $subject, $matches);
        echo $subject , ' => ', $matches[0], "\n\n";
}

prints

<?php $s = 'Hi everyone, we\'re ready now.'; ?> => 'Hi everyone, we\'re ready now.'

<?php $s = "Hi everyone, we\"re ready now."; ?> => "Hi everyone, we\"re ready now."

xyz'a\'bc\d'123 => 'a\'bc\d'

x = 'My string ends with with a backslash\\'; => 'My string ends with with a backslash\\'

Here's my solution with test cases:

/.*?'((?:\\\\|\\'|[^'])*+)'/

And my (Perl, but I don't use any Perl-specific features I don't think) proof:

use strict;
use warnings;

my %tests = ();
$tests{'Case 1'} = <<'EOF';
$var = 'My string';
EOF

$tests{'Case 2'} = <<'EOF';
$var = 'My string has it\'s challenges';
EOF

$tests{'Case 3'} = <<'EOF';
$var = 'My string ends with a backslash\\';
EOF

foreach my $key (sort (keys %tests)) {
    print "$key...\n";
    if ($tests{$key} =~ m/.*?'((?:\\\\|\\'|[^'])*+)'/) {
        print " ... '$1'\n";
    } else {
        print " ... NO MATCH\n";
    }
}

Running this shows:

$ perl a.pl
Case 1...
 ... 'My string'
Case 2...
 ... 'My string has it\'s challenges'
Case 3...
 ... 'My string ends with a backslash\\'

Note that the initial wildcard at the start needs to be non-greedy. Then I use non-backtracking matches to gobble up \\ and \' and then anything else that is not a standalone quote character.

I think this one probably mimics the compiler's built-in approach, which should make it pretty bullet-proof.

/.*'([^'\\]|\\.)*'.*/

The parenthesized portion looks for non-apostrophes/backslashes and backslash-escaped characters. If only certain characters can be escaped change the \\. to \\['\\a-z], or whatever.

Via negative look behind:

/
.*?'              #Match until '
(
 .*?              #Lazy match & capture of everything after the first apostrophe
)    
(?<!(?<!\\)\\)'   #Match first apostrophe that isn't preceded by \, but accept \\
.*                #Match remaining text
/
Regex reg = new Regex("(?<!\\\\)'(?<string>.*?)(?<!\\\\)'");
aMarCruz

This is for JavaScript:

/('|")(?:\\\\|\\\1|[\s\S])*?\1/

it...

  • matches single or double quoted strings
  • matches empty strings (length 0)
  • matches strings with embedded whitespace (\n, \t, etc.)
  • skips inner escaped quotes (single or double)
  • skips single quotes within double quotes and vice versa

Only the first quote is captured. You can capture the unquoted string in $2 with:

/('|")((?:\\\\|\\\1|[\s\S])*?)\1/

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!