Preg_match string inside curly braces tags

问题

I'd like to grab a string between tags. My tags will be with curly braces.

{myTag}Here is the string{/myTag}

So far I have found #<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s This one matches tags with angle brackets <>. I couldn't figure out how to make it look for curly braces.

Eventually I would like to parse whole page and grab all matches and build an array with strings.

This is the code:

function everything_in_tags($string, $tagname)
{
    $pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

$var = everything_in_tags($string, $tagname);

回答1:

Replace all occurrences of < and > with { and } and change preg_match() to preg_match_all()` to catch multiple occurrences of text inside those tags.

function everything_in_tags($string, $tagname)
{
    $pattern = "#{\s*?$tagname\b[^}]*}(.*?){/$tagname\b[^}]*}#s";
    preg_match_all($pattern, $string, $matches);
    return $matches[1];
}


$string = '{myTag}Here is the string{/myTag} and {myTag}here is more{/myTag}';
$tagname = 'myTag';
$var = everything_in_tags($string, $tagname);

Forget about what I mentioned about escaping the curly brackets - I was mistaken.

回答2:

It looks like you are building a general use helper function. For this reason, it is important that you escape any characters with special meaning to the regex engine. To escape characters with special meaning, use preg_quote().

We don't know the quality of the text that you are searching through, nor do we know the variability of your tag names. There will be some cases where it will be vital to use the m (multibyte) pattern modifier so that unicode characters are read correctly. The s pattern modifier tells the regex engine that the "any character" dot in the pattern should also match newline characters. The default behavior of the "any character" dot is not to match newline characters. If you need to accommodate tagnames of unknown upper/lower casing, use the i pattern modifier.

If the quality of your curly tags' contents is absolutely sure to not include any opening curly braces, then you could change (.*?) to ([^{]*) to allow the regex to perform more efficiently.

By capturing and referencing the tagname in its opening tag, you slightly reduce the step count of the pattern and reduce the total length of the pattern.

Code: (Demo)

$text = <<<TEXT
some text {myTag}Here is the string
on two lines{/myTag} some more text
TEXT;

function curlyTagContents(string $string, string $tagname): string
{
    $pattern = '/\{(' . preg_quote($tagname, '/') . ')}(.*?)\{\/\1}/s';
    return preg_match($pattern, $string, $matches) ? $matches[2] : '';
}

var_export(
    curlyTagContents($text, 'myTag')
);

Output: (the single quotes are from var_export())

'Here is the string
on two lines'

来源：https://stackoverflow.com/questions/33955525/preg-match-string-inside-curly-braces-tags

标签

php

regex

preg-match