Shorten text without splitting words or breaking html tags

前端 未结 8 1320
粉色の甜心
粉色の甜心 2020-12-28 17:06

I am trying to cut off text after 236 chars without cutting words in half and preserving html tags. This is what I am using right now:

$shortdesc = $_helper-         


        
相关标签:
8条回答
  • 2020-12-28 17:52

    I did in JS, hope this logic will help in PHP too..

    splitText : function(content, count){
            var originalContent = content;
             content = content.substring(0, count);
              //If there is no occurance of matches before breaking point and the hit breakes in between html tags.
             if (content.lastIndexOf("<") > content.lastIndexOf(">")){
                content = content.substring(0, content.lastIndexOf('<'));
                count = content.length;
                if(originalContent.indexOf("</", count)!=-1){
                    content += originalContent.substring(count, originalContent.indexOf('>', originalContent.indexOf("</", count))+1);
                }else{
                     content += originalContent.substring(count, originalContent.indexOf('>', count)+1);
                }
              //If the breaking point is in between tags.
             }else if(content.lastIndexOf("<") != content.lastIndexOf("</")){
                content = originalContent.substring(0, originalContent.indexOf('>', count)+1);
             }
            return content;
        },
    

    Hope this logic helps some one..

    0 讨论(0)
  • 2020-12-28 17:56

    This should do it:

    class Html
    {
        protected
            $reachedLimit = false,
            $totalLen = 0,
            $maxLen = 25,
            $toRemove = array();
    
        public static function trim($html, $maxLen = 25)
        {
    
            $dom = new DomDocument();
    
            if (version_compare(PHP_VERSION, '5.4.0') < 0) {
                $dom->loadHTML($html);
            } else {
                $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
            }
    
            $instance = new static();
            $toRemove = $instance->walk($dom, $maxLen);
    
            // remove any nodes that exceed limit
            foreach ($toRemove as $child) {
                $child->parentNode->removeChild($child);
            }
    
            // remove wrapper tags added by DD (doctype, html...)
            if (version_compare(PHP_VERSION, '5.4.0') < 0) {
                // http://stackoverflow.com/a/6953808/1058140
                $dom->removeChild($dom->firstChild);
                $dom->replaceChild($dom->firstChild->firstChild->firstChild, $dom->firstChild);
    
                return $dom->saveHTML();
            }
    
            return $dom->saveHTML();
        }
    
        protected function walk(DomNode $node, $maxLen)
        {
    
            if ($this->reachedLimit) {
                $this->toRemove[] = $node;
            } else {
                // only text nodes should have text,
                // so do the splitting here
                if ($node instanceof DomText) {
                    $this->totalLen += $nodeLen = strlen($node->nodeValue);
    
                    // use mb_strlen / mb_substr for UTF-8 support
                    if ($this->totalLen > $maxLen) {
                        $node->nodeValue = substr($node->nodeValue, 0, $nodeLen - ($this->totalLen - $maxLen)) . '...';
                        $this->reachedLimit = true;
                    }
                }
    
                // if node has children, walk its child elements
                if (isset($node->childNodes)) {
                    foreach ($node->childNodes as $child) {
                        $this->walk($child, $maxLen);
                    }
                }
            }
    
            return $this->toRemove;
        }
    }
    

    Use like: $str = Html::trim($str, 236);

    (demo here)


    Some performance comparisons between this and cakePHP's regex solution

    enter image description here

    There's very little difference, and at very large string sizes, DomDocument is actually faster. Reliability is more important than saving a few microseconds in my opinion.

    0 讨论(0)
提交回复
热议问题