preg_replace only OUTSIDE tags ? (… we're not talking full 'html parsing', just a bit of markdown)

社会主义新天地 提交于 2020-01-01 06:46:10

问题


What is the easiest way of applying highlighting of some text excluding text within OCCASIONAL tags "<...>"?

CLARIFICATION: I want the existing tags PRESERVED!

$t = 
preg_replace(
  "/(markdown)/",
  "<strong>$1</strong>",
"This is essentially plain text apart from a few html tags generated with some
simplified markdown rules: <a href=markdown.html>[see here]</a>");

Which should display as:

"This is essentially plain text apart from a few html tags generated with some simplified markdown rules: see here"

... BUT NOT MESS UP the text inside the anchor tag (i.e. <a href=markdown.html> ).

I've heard the arguments of not parsing html with regular expressions, but here we're talking essentially about plain text except for minimal parsing of some markdown code.


回答1:


Actually, this seems to work ok:

<?php
$item="markdown";
$t="This is essentially plain text apart from a few html tags generated 
with some simplified markdown rules: <a href=markdown.html>[see here]</a>";

//_____1. apply emphasis_____
$t = preg_replace("|($item)|","<strong>$1</strong>",$t);

// "This is essentially plain text apart from a few html tags generated 
// with some simplified <strong>markdown</strong> rules: <a href=
// <strong>markdown</strong>.html>[see here]</a>"

//_____2. remove emphasis if WITHIN opening and closing tag____
$t = preg_replace("|(<[^>]+?)(<strong>($item)</strong>)([^<]+?>)|","$1$3$4",$t);

// this preserves the text before ($1), after ($4) 
// and inside <strong>..</strong> ($2), but without the tags ($3)

// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=markdown.html>
// [see here]</a>"

?>

A string like $item="odd|string" would cause some problems, but I won't be using that kind of string anyway... (probably needs htmlentities(...) or the like...)




回答2:


You could split the string into tag‍/‍no-tag parts using preg_split:

$parts = preg_split('/(<(?:[^"\'>]|"[^"<]*"|\'[^\'<]*\')*>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);

Then you can iterate the parts while skipping every even part (i.e. the tag parts) and apply your replacement on it:

for ($i=0, $n=count($parts); $i<$n; $i+=2) {
    $parts[$i] = preg_replace("/(markdown)/", "<strong>$1</strong>", $parts[$i]);
}

At the end put everything back together with implode:

$str = implode('', $parts);

But note that this is really not the best solution. You should better use a proper HTML parser like PHP’s DOM library. See for example these related questions:

  • Highlight keywords in a paragraph
  • Regex / DOMDocument - match and replace text not in a link



回答3:


You could split your string into an array at every '<' or '>' using preg_split(), then loop through that array and replace only in entries not beginning with an '>'. Afterwards you combine your array to an string using implode().




回答4:


This regex should strip all HTML opening and closing tags: /(<[.*?]>)+/

You can use it with preg_replace like this:

$test = "Hello <strong>World!</strong>";
$regex = "/(<.*?>)+/";


$result = preg_replace($regex,"",$test);



回答5:


actually this is not very efficient, but it worked for me

$your_string = '...';

$search = 'markdown';
$left = '<strong>';
$right = '</strong>';

$left_Q = preg_quote($left, '#');
$right_Q = preg_quote($right, '#');
$search_Q = preg_quote($search, '#');
while(preg_match('#(>|^)[^<]*(?<!'.$left_Q.')'.$search_Q.'(?!'.$right_Q.')[^>]*(<|$)#isU', $your_string))
  $your_string = preg_replace('#(^[^<]*|>[^<]*)(?<!'.$left_Q.')('.$search_Q.')(?!'.$right_Q.')([^>]*<|[^>]*$)#isU', '${1}'.$left.'${2}'.$right.'${3}', $your_string);

echo $your_string;


来源:https://stackoverflow.com/questions/4603780/preg-replace-only-outside-tags-were-not-talking-full-html-parsing-jus

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!