I am trying to cut off text after 236 chars without cutting words in half and preserving html tags. This is what I am using right now:
$shortdesc = $_helper-
I did in JS, hope this logic will help in PHP too..
splitText : function(content, count){
var originalContent = content;
content = content.substring(0, count);
//If there is no occurance of matches before breaking point and the hit breakes in between html tags.
if (content.lastIndexOf("<") > content.lastIndexOf(">")){
content = content.substring(0, content.lastIndexOf('<'));
count = content.length;
if(originalContent.indexOf("</", count)!=-1){
content += originalContent.substring(count, originalContent.indexOf('>', originalContent.indexOf("</", count))+1);
}else{
content += originalContent.substring(count, originalContent.indexOf('>', count)+1);
}
//If the breaking point is in between tags.
}else if(content.lastIndexOf("<") != content.lastIndexOf("</")){
content = originalContent.substring(0, originalContent.indexOf('>', count)+1);
}
return content;
},
Hope this logic helps some one..
This should do it:
class Html
{
protected
$reachedLimit = false,
$totalLen = 0,
$maxLen = 25,
$toRemove = array();
public static function trim($html, $maxLen = 25)
{
$dom = new DomDocument();
if (version_compare(PHP_VERSION, '5.4.0') < 0) {
$dom->loadHTML($html);
} else {
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
}
$instance = new static();
$toRemove = $instance->walk($dom, $maxLen);
// remove any nodes that exceed limit
foreach ($toRemove as $child) {
$child->parentNode->removeChild($child);
}
// remove wrapper tags added by DD (doctype, html...)
if (version_compare(PHP_VERSION, '5.4.0') < 0) {
// http://stackoverflow.com/a/6953808/1058140
$dom->removeChild($dom->firstChild);
$dom->replaceChild($dom->firstChild->firstChild->firstChild, $dom->firstChild);
return $dom->saveHTML();
}
return $dom->saveHTML();
}
protected function walk(DomNode $node, $maxLen)
{
if ($this->reachedLimit) {
$this->toRemove[] = $node;
} else {
// only text nodes should have text,
// so do the splitting here
if ($node instanceof DomText) {
$this->totalLen += $nodeLen = strlen($node->nodeValue);
// use mb_strlen / mb_substr for UTF-8 support
if ($this->totalLen > $maxLen) {
$node->nodeValue = substr($node->nodeValue, 0, $nodeLen - ($this->totalLen - $maxLen)) . '...';
$this->reachedLimit = true;
}
}
// if node has children, walk its child elements
if (isset($node->childNodes)) {
foreach ($node->childNodes as $child) {
$this->walk($child, $maxLen);
}
}
}
return $this->toRemove;
}
}
Use like: $str = Html::trim($str, 236);
(demo here)
There's very little difference, and at very large string sizes, DomDocument is actually faster. Reliability is more important than saving a few microseconds in my opinion.