问题
What is the easiest way of applying highlighting of some text excluding text within OCCASIONAL tags "<...>"?
CLARIFICATION: I want the existing tags PRESERVED!
$t =
preg_replace(
"/(markdown)/",
"<strong>$1</strong>",
"This is essentially plain text apart from a few html tags generated with some
simplified markdown rules: <a href=markdown.html>[see here]</a>");
Which should display as:
"This is essentially plain text apart from a few html tags generated with some simplified markdown rules: see here"
... BUT NOT MESS UP the text inside the anchor tag (i.e. <a href=markdown.html>
).
I've heard the arguments of not parsing html with regular expressions, but here we're talking essentially about plain text except for minimal parsing of some markdown code.
回答1:
Actually, this seems to work ok:
<?php
$item="markdown";
$t="This is essentially plain text apart from a few html tags generated
with some simplified markdown rules: <a href=markdown.html>[see here]</a>";
//_____1. apply emphasis_____
$t = preg_replace("|($item)|","<strong>$1</strong>",$t);
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=
// <strong>markdown</strong>.html>[see here]</a>"
//_____2. remove emphasis if WITHIN opening and closing tag____
$t = preg_replace("|(<[^>]+?)(<strong>($item)</strong>)([^<]+?>)|","$1$3$4",$t);
// this preserves the text before ($1), after ($4)
// and inside <strong>..</strong> ($2), but without the tags ($3)
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=markdown.html>
// [see here]</a>"
?>
A string like $item="odd|string"
would cause some problems, but I won't be using that kind of string anyway... (probably needs htmlentities(...) or the like...)
回答2:
You could split the string into tag/no-tag parts using preg_split:
$parts = preg_split('/(<(?:[^"\'>]|"[^"<]*"|\'[^\'<]*\')*>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
Then you can iterate the parts while skipping every even part (i.e. the tag parts) and apply your replacement on it:
for ($i=0, $n=count($parts); $i<$n; $i+=2) {
$parts[$i] = preg_replace("/(markdown)/", "<strong>$1</strong>", $parts[$i]);
}
At the end put everything back together with implode:
$str = implode('', $parts);
But note that this is really not the best solution. You should better use a proper HTML parser like PHP’s DOM library. See for example these related questions:
- Highlight keywords in a paragraph
- Regex / DOMDocument - match and replace text not in a link
回答3:
You could split your string into an array at every '<' or '>' using preg_split()
, then loop through that array and replace only in entries not beginning with an '>'. Afterwards you combine your array to an string using implode()
.
回答4:
This regex should strip all HTML opening and closing tags: /(<[.*?]>)+/
You can use it with preg_replace like this:
$test = "Hello <strong>World!</strong>";
$regex = "/(<.*?>)+/";
$result = preg_replace($regex,"",$test);
回答5:
actually this is not very efficient, but it worked for me
$your_string = '...';
$search = 'markdown';
$left = '<strong>';
$right = '</strong>';
$left_Q = preg_quote($left, '#');
$right_Q = preg_quote($right, '#');
$search_Q = preg_quote($search, '#');
while(preg_match('#(>|^)[^<]*(?<!'.$left_Q.')'.$search_Q.'(?!'.$right_Q.')[^>]*(<|$)#isU', $your_string))
$your_string = preg_replace('#(^[^<]*|>[^<]*)(?<!'.$left_Q.')('.$search_Q.')(?!'.$right_Q.')([^>]*<|[^>]*$)#isU', '${1}'.$left.'${2}'.$right.'${3}', $your_string);
echo $your_string;
来源:https://stackoverflow.com/questions/4603780/preg-replace-only-outside-tags-were-not-talking-full-html-parsing-jus