How to Truncate a string in PHP to the word closest to a certain number of characters?

前端 未结 27 1125
猫巷女王i
猫巷女王i 2020-11-22 07:28

I have a code snippet written in PHP that pulls a block of text from a database and sends it out to a widget on a webpage. The original block of text can be a lengthy artic

相关标签:
27条回答
  • 2020-11-22 07:57

    I would use the preg_match function to do this, as what you want is a pretty simple expression.

    $matches = array();
    $result = preg_match("/^(.{1,199})[\s]/i", $text, $matches);
    

    The expression means "match any substring starting from the beginning of length 1-200 that ends with a space." The result is in $result, and the match is in $matches. That takes care of your original question, which is specifically ending on any space. If you want to make it end on newlines, change the regular expression to:

    $result = preg_match("/^(.{1,199})[\n]/i", $text, $matches);
    
    0 讨论(0)
  • 2020-11-22 07:59

    I create a function more similar to substr, and using the idea of @Dave.

    function substr_full_word($str, $start, $end){
        $pos_ini = ($start == 0) ? $start : stripos(substr($str, $start, $end), ' ') + $start;
        if(strlen($str) > $end){ $pos_end = strrpos(substr($str, 0, ($end + 1)), ' '); } // IF STRING SIZE IS LESSER THAN END
        if(empty($pos_end)){ $pos_end = $end; } // FALLBACK
        return substr($str, $pos_ini, $pos_end);
    }
    

    Ps.: The full length cut may be less than substr.

    0 讨论(0)
  • 2020-11-22 08:00

    This is a small fix for mattmac's answer:

    preg_replace('/\s+?(\S+)?$/', '', substr($string . ' ', 0, 201));
    

    The only difference is to add a space at the end of $string. This ensures the last word isn't cut off as per ReX357's comment.

    I don't have enough rep points to add this as a comment.

    0 讨论(0)
  • 2020-11-22 08:02

    I have a function that does almost what you want, if you'll do a few edits, it will fit exactly:

    <?php
    function stripByWords($string,$length,$delimiter = '<br>') {
        $words_array = explode(" ",$string);
        $strlen = 0;
        $return = '';
        foreach($words_array as $word) {
            $strlen += mb_strlen($word,'utf8');
            $return .= $word." ";
            if($strlen >= $length) {
                $strlen = 0;
                $return .= $delimiter;
            }
        }
        return $return;
    }
    ?>
    
    0 讨论(0)
  • 2020-11-22 08:03

    By using the wordwrap function. It splits the texts in multiple lines such that the maximum width is the one you specified, breaking at word boundaries. After splitting, you simply take the first line:

    substr($string, 0, strpos(wordwrap($string, $your_desired_width), "\n"));
    

    One thing this oneliner doesn't handle is the case when the text itself is shorter than the desired width. To handle this edge-case, one should do something like:

    if (strlen($string) > $your_desired_width) 
    {
        $string = wordwrap($string, $your_desired_width);
        $string = substr($string, 0, strpos($string, "\n"));
    }
    

    The above solution has the problem of prematurely cutting the text if it contains a newline before the actual cutpoint. Here a version which solves this problem:

    function tokenTruncate($string, $your_desired_width) {
      $parts = preg_split('/([\s\n\r]+)/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
      $parts_count = count($parts);
    
      $length = 0;
      $last_part = 0;
      for (; $last_part < $parts_count; ++$last_part) {
        $length += strlen($parts[$last_part]);
        if ($length > $your_desired_width) { break; }
      }
    
      return implode(array_slice($parts, 0, $last_part));
    }
    

    Also, here is the PHPUnit testclass used to test the implementation:

    class TokenTruncateTest extends PHPUnit_Framework_TestCase {
      public function testBasic() {
        $this->assertEquals("1 3 5 7 9 ",
          tokenTruncate("1 3 5 7 9 11 14", 10));
      }
    
      public function testEmptyString() {
        $this->assertEquals("",
          tokenTruncate("", 10));
      }
    
      public function testShortString() {
        $this->assertEquals("1 3",
          tokenTruncate("1 3", 10));
      }
    
      public function testStringTooLong() {
        $this->assertEquals("",
          tokenTruncate("toooooooooooolooooong", 10));
      }
    
      public function testContainingNewline() {
        $this->assertEquals("1 3\n5 7 9 ",
          tokenTruncate("1 3\n5 7 9 11 14", 10));
      }
    }
    

    EDIT :

    Special UTF8 characters like 'à' are not handled. Add 'u' at the end of the REGEX to handle it:

    $parts = preg_split('/([\s\n\r]+)/u', $string, null, PREG_SPLIT_DELIM_CAPTURE);

    0 讨论(0)
  • 2020-11-22 08:04

    I find this works:

    function abbreviate_string_to_whole_word($string,$max_length,$buffer) {

    if (strlen($string)>$max_length) {
        $string_cropped=substr($string,0,$max_length-$buffer);
        $last_space=strrpos($string_cropped, " ");
        if ($last_space>0) {
            $string_cropped=substr($string_cropped,0,$last_space);
        }
        $abbreviated_string=$string_cropped."&nbsp;...";
    }
    else {
        $abbreviated_string=$string;
    }
    
    return $abbreviated_string;
    

    }

    The buffer allows you to adjust the length of the returned string.

    0 讨论(0)
提交回复
热议问题