Say I have a 200 character string that contains HTML markup. I want to show a preview of just the first 50 chars. without \'splitting up\' the tags. In other words, the frag
Short answer: convert it to DOM with DOMDocument::loadHTML($string)
then walk the tree counting the characters in the text nodes. When you hit your limit, replace the rest of that node with '...' or the empty string, and simply call $node->parentNode->removeChild($node)
on all subsequent nodes.
A simple approach might be to strip_tags()
first and then capture the excerpt.
You should check out Tidy HTML. Just cut it after the first 50 non-HTML characters, then run it through Tidy to fix the HTML.
Here's a fast and reliable solution using DOMDocument which is part of standard PHP:
function cut_html ($html, $limit) {
$dom = new DOMDocument();
$dom->loadHTML(mb_convert_encoding("<div>{$html}</div>", "HTML-ENTITIES", "UTF-8"), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
cut_html_recursive($dom->documentElement, $limit);
return substr($dom->saveHTML($dom->documentElement), 5, -6);
}
function cut_html_recursive ($element, $limit) {
if($limit > 0) {
if($element->nodeType == 3) {
$limit -= strlen($element->nodeValue);
if($limit < 0) {
$element->nodeValue = substr($element->nodeValue, 0, strlen($element->nodeValue) + $limit);
}
}
else {
for($i = 0; $i < $element->childNodes->length; $i++) {
if($limit > 0) {
$limit = cut_html_recursive($element->childNodes->item($i), $limit);
}
else {
$element->removeChild($element->childNodes->item($i));
$i--;
}
}
}
}
return $limit;
}