$str = \'some text tag contents more text \';
My questions are:
How to retrieve content tag contents
which is between
You do not want to use regular expressions for this. A much better solution would be to load your contents into a DOMDocument and work on it using the DOM tree and standard DOM methods:
$document = new DOMDocument();
$document->loadXML('<root/>');
$document->documentElement->appendChild(
$document->createFragment($myTextWithTags));
$MY_TAGs = $document->getElementsByTagName('MY_TAG');
foreach($MY_TAGs as $MY_TAG)
{
$xmlContent = $document->saveXML($MY_TAG);
/* work on $xmlContent here */
/* as a further example: */
$ems = $MY_TAG->getElementsByTagName('em');
foreach($ems as $em)
{
$emphazisedText = $em->nodeValue;
/* do your operations here */
}
}
For removal I ended up just using this:
$str = preg_replace('~<MY_TAG(.*?)</MY_TAG>~Usi', "", $str);
Using ~ instead of / for the delimiter solved errors being thrown because of the backslash in the end tag, which seemed to be an issue even with escaping. Eliminating > from the opening tag allows for attributes or other characters and still gets the tag and all of its contents.
This only works where nesting is not a concern.
The Usi
modifiers mean U = Ungreedy, s = include linebreak characters, i = case insensitive.
If MY_TAG
can not be nested, try this to get the matches:
preg_match_all('/<MY_TAG>(.*?)<\/MY_TAG>/s', $str, $matches)
And to remove them, use preg_replace instead.
Although the only fully correct way to do this is not to use regular expressions, you can get what you want if you accept it won't handle all special cases:
preg_match("/<em[^>]*?>.*?</em>/i", $str, $match);
// Use this only if you aren't worried about nested tags.
// It will handle tags with attributes
And
preg_replace(""/<MY_TAG[^>]*?>.*?</MY_TAG>/i", "", $str);
I tested this function, it works for nested tags too, use true/false to exclude/include your tags. Found here: https://www.php.net/manual/en/function.strip-tags.php
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
$tags = array_unique($tags[1]);
if(is_array($tags) AND count($tags) > 0) {
if($invert == FALSE) {
return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
}
else {
return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
}
}
elseif($invert == FALSE) {
return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
}
return $text;
}
// Sample text:
$text = '<b>sample</b> text with <div>tags</div>';
// Result for:
echo strip_tags_content($text);
// text with
// Result for:
echo strip_tags_content($text, '<b>');
// <b>sample</b> text with
// Result for:
echo strip_tags_content($text, '<b>', TRUE);
// text with <div>tags</div>