Regex to match only innermost delimited sequence

为君一笑 提交于 2019-12-01 06:19:08

问题


I have a string that contains sequences delimited by multiple characters: << and >>. I need a regular expression to only give me the innermost sequences. I have tried lookaheads but they don't seem to work in the way I expect them to.

Here is a test string:

'do not match this <<but match this>> not this <<BUT NOT THIS <<this too>> IT HAS CHILDREN>> <<and <also> this>>'

It should return:

but match this
this too
and <also> this

As you can see with the third result, I can't just use /<<[^>]+>>/ because the string may have one character of the delimiters, but not two in a row.

I'm fresh out of trial-and-error. Seems to me this shouldn't be this complicated.


回答1:


@matches = $string =~ /(<<(?:(?!<<|>>).)*>>)/g;

(?:(?!PAT).)* is to patterns as [^CHAR]* is to characters.




回答2:


$string = 'do not match this <<but match this>> not this <<BUT NOT THIS <<this too>> IT HAS CHILDREN>> <<and <also> this>>';
@matches = $string =~ /(<<(?:[^<>]+|<(?!<)|>(?!>))*>>)/g;



回答3:


Here's a way to use split for the job:

my $str = 'do not match this <<but match this>> not this <<BUT NOT THIS <<this too>> IT HAS CHILDREN>> <<and <also> this>>';
my @a = split /(?=<<)/, $str;
@a = map { split /(?<=>>)/, $_ } @a;

my @match = grep { /^<<.*?>>$/ } @a;

Keeps the tags in there, if you want them removed, just do:

@match = map { s/^<<//; s/>>$//; $_ } @match;


来源:https://stackoverflow.com/questions/6990719/regex-to-match-only-innermost-delimited-sequence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!