I have a long text like below:
$postText=\"It is a long established fact that a reader will be distracted by the readable content of a page when looking at its l
Why is preg_split()
returning an empty string for the first element?
That is because the pattern that you feed the function dictates where it should explode/break. The matched characters are treated as a "delimiter" and are, in fact, discarded using the function's default behavior.
When your input string has at least 170 characters, then optional non-whitespace characters, then a whitespace character -- all of these matched characters become the delimiter. When preg_split()
splits a string, it will potentially generate zero-length elements depending on the location of the delimiter.
For instance, if you have a string aa
and split it on a
, the function will return 3 empty elements -- one before the first a
, one between the a
's, and one after the second a
.
Code: (Demo)
$string = "aa";
var_export(preg_split('/a/', $string));
// output: array ( 0 => '', 1 => '', 2 => '', )
To ensure that no empty strings are generated, you can set the fourth parameter of the function to PREG_SPLIT_NO_EMPTY
(the 3rd parameter must be declared for the 4th parameter to be recognized).
var_export(preg_split('/a/', $string, -1, PREG_SPLIT_NO_EMPTY));
// output: array ( )
You could add the PREG_SPLIT_NO_EMPTY
parameter to your function call to remove the empty string, but because the substring that you want to keep is used as the delimiter, it is lost in the process.
A greater matter of importance is the fact that preg_split()
is not the best tool for this job.
Your posted snippet:
$postText
with the element containing leading portion and concatenates the ellipsis hyperlink.Code: (Demo)
$postText = "It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";
$ellipsis = "...read more";
echo preg_replace('/.{170}\S*\s\K.+/', $ellipsis, $postText);
The beauty in this call is that if the $postText
doesn't qualify for truncation because it doesn't have 170 characters, optionally followed by non-whitespace characters, followed by a whitespace character, then nothing happens -- the string remains whole.
The \K
in the pattern commands that the first ~170 characters are released/forgotten/discarded as matched characters. Then the .+
means match one or more of any character (as much as possible). By this pattern logic, there will only be one replacement executed. preg_replace()
modifies the $postText
string without any concatenation syntax.
*note, if your input string may contain newline characters, you should add the s
pattern modifier so that the .
will match any character including newline characters. Pattern: /.{170}\S*\s\K.+/s
*if you want to truncate your input string at the end of the word beyond the 170th character, you can use this pattern: /.{170}\S*\K.+/
and you could add a space at the start of the replacement/ellipsis string to provide some separation.
Using a non-regex approach is a bit more clunky and requires a conditional statement to maintain the same level of accuracy (so I don't recommend it, but I'll display the technique anyhow).
Using substr_replace()
, you need to check if there is enough length in the string to offer a valid offset for strpos()
. If so, you can replace.
Code: (Demo)
$postText = "It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";
$ellipsis = "...read more";
if (($len = strlen($postText)) > 170 && ($pos = strpos($postText, ' ', 170)) && ++$pos < $len){
$postText = substr_replace($postText, $ellipsis, $pos);
}
echo $postText;
The above snippet assumes there are only spaces, in the input string (versus tabs and newline characters which you may want to split on).