Shorten text without splitting words or breaking html tags

前端 未结 8 1319
粉色の甜心
粉色の甜心 2020-12-28 17:06

I am trying to cut off text after 236 chars without cutting words in half and preserving html tags. This is what I am using right now:

$shortdesc = $_helper-         


        
相关标签:
8条回答
  • 2020-12-28 17:30

    Best solution I have come across for this is from the CakePHP framework TextHelper class

    Here is the method

    /**
    * Truncates text.
    *
    * Cuts a string to the length of $length and replaces the last characters
    * with the ending if the text is longer than length.
    *
    * ### Options:
    *
    * - `ending` Will be used as Ending and appended to the trimmed string
    * - `exact` If false, $text will not be cut mid-word
    * - `html` If true, HTML tags would be handled correctly
    *
    * @param string  $text String to truncate.
    * @param integer $length Length of returned string, including ellipsis.
    * @param array $options An array of html attributes and options.
    * @return string Trimmed string.
    * @access public
    * @link http://book.cakephp.org/view/1469/Text#truncate-1625
    */
    function truncate($text, $length = 100, $options = array()) {
        $default = array(
            'ending' => '...', 'exact' => true, 'html' => false
        );
        $options = array_merge($default, $options);
        extract($options);
    
        if ($html) {
            if (mb_strlen(preg_replace('/<.*?>/', '', $text)) <= $length) {
                return $text;
            }
            $totalLength = mb_strlen(strip_tags($ending));
            $openTags = array();
            $truncate = '';
    
            preg_match_all('/(<\/?([\w+]+)[^>]*>)?([^<>]*)/', $text, $tags, PREG_SET_ORDER);
            foreach ($tags as $tag) {
                if (!preg_match('/img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param/s', $tag[2])) {
                    if (preg_match('/<[\w]+[^>]*>/s', $tag[0])) {
                        array_unshift($openTags, $tag[2]);
                    } else if (preg_match('/<\/([\w]+)[^>]*>/s', $tag[0], $closeTag)) {
                        $pos = array_search($closeTag[1], $openTags);
                        if ($pos !== false) {
                            array_splice($openTags, $pos, 1);
                        }
                    }
                }
                $truncate .= $tag[1];
    
                $contentLength = mb_strlen(preg_replace('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', ' ', $tag[3]));
                if ($contentLength + $totalLength > $length) {
                    $left = $length - $totalLength;
                    $entitiesLength = 0;
                    if (preg_match_all('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', $tag[3], $entities, PREG_OFFSET_CAPTURE)) {
                        foreach ($entities[0] as $entity) {
                            if ($entity[1] + 1 - $entitiesLength <= $left) {
                                $left--;
                                $entitiesLength += mb_strlen($entity[0]);
                            } else {
                                break;
                            }
                        }
                    }
    
                    $truncate .= mb_substr($tag[3], 0 , $left + $entitiesLength);
                    break;
                } else {
                    $truncate .= $tag[3];
                    $totalLength += $contentLength;
                }
                if ($totalLength >= $length) {
                    break;
                }
            }
        } else {
            if (mb_strlen($text) <= $length) {
                return $text;
            } else {
                $truncate = mb_substr($text, 0, $length - mb_strlen($ending));
            }
        }
        if (!$exact) {
            $spacepos = mb_strrpos($truncate, ' ');
            if (isset($spacepos)) {
                if ($html) {
                    $bits = mb_substr($truncate, $spacepos);
                    preg_match_all('/<\/([a-z]+)>/', $bits, $droppedTags, PREG_SET_ORDER);
                    if (!empty($droppedTags)) {
                        foreach ($droppedTags as $closingTag) {
                            if (!in_array($closingTag[1], $openTags)) {
                                array_unshift($openTags, $closingTag[1]);
                            }
                        }
                    }
                }
                $truncate = mb_substr($truncate, 0, $spacepos);
            }
        }
        $truncate .= $ending;
    
        if ($html) {
            foreach ($openTags as $tag) {
                $truncate .= '</'.$tag.'>';
            }
        }
    
        return $truncate;
    }
    

    Other frameworks may have similar (or different) solutions to this problem, so you could take a look at them too. My familiarity with Cake is what prompted my linking to their solution

    Edit:

    Just tested this method in an app I'm working on with the OP's text

    <?php 
    echo truncate(
    'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong>', 
    236, 
    array('html' => true, 'ending' => '')); 
    ?>
    

    Output:

    Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubegre</strong>
    

    Notice the output stops just short of completing the last word, but includes the complete strong tags

    0 讨论(0)
  • 2020-12-28 17:31

    Can I just give a thought ?

    Sample text :

    Lorem ipsum dolor sit amet, <i class="red">magna aliquyam erat</i>, duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong> hello
    

    First, parse it into:

    array(
        '0' => array(
            'tag' => '',
            'text' => 'Lorem ipsum dolor sit amet, '
        ),
        '1' => array(
            'tag' => '<i class="red">',
            'text' => 'magna aliquyam erat',
        )
        '2' => ......
        '3' => ......
    )
    

    then cut the text one by one, and wrap each one with its tag after cut,

    then join them.

    0 讨论(0)
  • 2020-12-28 17:35

    This will work with Unicode (from @nice ass answer):

    class Html
    {
        protected
            $reachedLimit = false,
            $totalLen = 0,
            $maxLen = 25,
            $toRemove = [];
    
        public static function trim($html, $maxLen = 25)
        {
    
            $dom = new \DOMDocument();
            $dom->loadHTML('<?xml encoding="UTF-8">' . $html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    
            $instance = new static();
            $toRemove = $instance->walk($dom, $maxLen);
    
            // remove any nodes that exceed limit
            foreach ($toRemove as $child) {
                $child->parentNode->removeChild($child);
            }
    
            return $dom->saveHTML();
        }
    
        protected function walk(\DOMNode $node, $maxLen)
        {
    
            if ($this->reachedLimit) {
                $this->toRemove[] = $node;
            } else {
                // only text nodes should have text,
                // so do the splitting here
                if ($node instanceof \DOMText) {
                    $this->totalLen += $nodeLen = mb_strlen($node->nodeValue);
    
                    // use mb_strlen / mb_substr for UTF-8 support
                    if ($this->totalLen > $maxLen) {
                        dump($node->nodeValue);
                        $node->nodeValue = mb_substr($node->nodeValue, 0, $nodeLen - ($this->totalLen - $maxLen)) . '...';
                        $this->reachedLimit = true;
                    }
                }
    
                // if node has children, walk its child elements
                if (isset($node->childNodes)) {
                    foreach ($node->childNodes as $child) {
                        $this->walk($child, $maxLen);
                    }
                }
            }
    
            return $this->toRemove;
        }
    }
    
    0 讨论(0)
  • 2020-12-28 17:35
    function limitStrlen($input, $length, $ellipses = true, $strip_html = true, $skip_html) 
    {
        // strip tags, if desired
        if ($strip_html || !$skip_html) 
        {
            $input = strip_tags($input);
    
            // no need to trim, already shorter than trim length
            if (strlen($input) <= $length) 
            {
                return $input;
            }
    
            //find last space within length
            $last_space = strrpos(substr($input, 0, $length), ' ');
            if($last_space !== false) 
            {
                $trimmed_text = substr($input, 0, $last_space);
            } 
            else 
            {
                $trimmed_text = substr($input, 0, $length);
            }
        } 
        else 
        {
            if (strlen(strip_tags($input)) <= $length) 
            {
                return $input;
            }
    
            $trimmed_text = $input;
    
            $last_space = $length + 1;
    
            while(true)
            {
                $last_space = strrpos($trimmed_text, ' ');
    
                if($last_space !== false) 
                {
                    $trimmed_text = substr($trimmed_text, 0, $last_space);
    
                    if (strlen(strip_tags($trimmed_text)) <= $length) 
                    {
                        break;
                    }
                } 
                else 
                {
                    $trimmed_text = substr($trimmed_text, 0, $length);
                    break;
                }
            }
    
            // close unclosed tags.
            $doc = new DOMDocument();
            $doc->loadHTML($trimmed_text);
            $trimmed_text = $doc->saveHTML();
        }
    
        // add ellipses (...)
        if ($ellipses) 
        {
            $trimmed_text .= '...';
        }
    
        return $trimmed_text;
    }
    
    $str = "<h1><strong><span>Lorem</span></strong> <i>ipsum</i> <p class='some-class'>dolor</p> sit amet, consetetur.</h1>";
    
    // view the HTML
    echo htmlentities(limitStrlen($str, 22, false, false, true), ENT_COMPAT, 'UTF-8');
    
    // view the result
    echo limitStrlen($str, 22, false, false, true);
    

    Note: There may be a better way to close tags instead of using DOMDocument. For example we can use a p tag inside a h1 tag and it still will work. But in this case the heading tag will close before the p tag because theoretically it's not possible to use p tag inside it. So, be careful for HTML's strict standards.

    0 讨论(0)
  • 2020-12-28 17:36

    Here is JS solution: trim-html

    The idea is to split HTML string in that way to have an array with elements being html tag(open or closed) or just string.

    var arr = html.replace(/</g, "\n<")
                  .replace(/>/g, ">\n")
                  .replace(/\n\n/g, "\n")
                  .replace(/^\n/g, "")
                  .replace(/\n$/g, "")
                  .split("\n");
    

    Than we can iterate through array and count characters.

    0 讨论(0)
  • 2020-12-28 17:43

    You can take an XML approach and push elements to a string var until the length of the string exceed 236

    example code ?

    for each node // text or tag
      push to the string var
    
      if string length > 236
        break
    
    endfor
    

    for parsing HTML in PHP http://simplehtmldom.sourceforge.net/

    0 讨论(0)
提交回复
热议问题