I have a long text like below:
$postText=\"It is a long established fact that a reader will be distracted by the readable content of a page when looking at its l
split array always return the first index as null.
It doesn't return NULL, it returns an empty string (''
); they are completely different objects with different semantics.
The reason why the first element of the returned array is an empty string is clearly documented in the manual page of preg_split():
Return Values:
Returns an array containing substrings of
subject
split along boundaries matched bypattern
, orFALSE
on failure.
The regex you provide as the first argument to preg_split() is used to match the delimiter, not the pieces. The function you need is preg_match():
$postText = "It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";
preg_match('/^.{170}\S*/', $postText, $matches);
$postText = $matches[0] . " ...<a class='see-more' href='http://example.com/seemore-link'>read more</a>";
If preg_match()
returns TRUE
, $matches[0]
contains the string you need.
There are situations when preg_match()
fails with your original regex
. For example, if your input string has exactly 170 characters, the \s
won't match. This is why I removed the \s
from the regex
and added a white space in front of the string appended after the match.
Why is preg_split()
returning an empty string for the first element?
That is because the pattern that you feed the function dictates where it should explode/break. The matched characters are treated as a "delimiter" and are, in fact, discarded using the function's default behavior.
When your input string has at least 170 characters, then optional non-whitespace characters, then a whitespace character -- all of these matched characters become the delimiter. When preg_split()
splits a string, it will potentially generate zero-length elements depending on the location of the delimiter.
For instance, if you have a string aa
and split it on a
, the function will return 3 empty elements -- one before the first a
, one between the a
's, and one after the second a
.
Code: (Demo)
$string = "aa";
var_export(preg_split('/a/', $string));
// output: array ( 0 => '', 1 => '', 2 => '', )
To ensure that no empty strings are generated, you can set the fourth parameter of the function to PREG_SPLIT_NO_EMPTY
(the 3rd parameter must be declared for the 4th parameter to be recognized).
var_export(preg_split('/a/', $string, -1, PREG_SPLIT_NO_EMPTY));
// output: array ( )
You could add the PREG_SPLIT_NO_EMPTY
parameter to your function call to remove the empty string, but because the substring that you want to keep is used as the delimiter, it is lost in the process.
A greater matter of importance is the fact that preg_split()
is not the best tool for this job.
Your posted snippet:
$postText
with the element containing leading portion and concatenates the ellipsis hyperlink.Code: (Demo)
$postText = "It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";
$ellipsis = "...<a class='see-more' href='http://example.com/seemore-link'>read more</a>";
echo preg_replace('/.{170}\S*\s\K.+/', $ellipsis, $postText);
The beauty in this call is that if the $postText
doesn't qualify for truncation because it doesn't have 170 characters, optionally followed by non-whitespace characters, followed by a whitespace character, then nothing happens -- the string remains whole.
The \K
in the pattern commands that the first ~170 characters are released/forgotten/discarded as matched characters. Then the .+
means match one or more of any character (as much as possible). By this pattern logic, there will only be one replacement executed. preg_replace()
modifies the $postText
string without any concatenation syntax.
*note, if your input string may contain newline characters, you should add the s
pattern modifier so that the .
will match any character including newline characters. Pattern: /.{170}\S*\s\K.+/s
*if you want to truncate your input string at the end of the word beyond the 170th character, you can use this pattern: /.{170}\S*\K.+/
and you could add a space at the start of the replacement/ellipsis string to provide some separation.
Using a non-regex approach is a bit more clunky and requires a conditional statement to maintain the same level of accuracy (so I don't recommend it, but I'll display the technique anyhow).
Using substr_replace()
, you need to check if there is enough length in the string to offer a valid offset for strpos()
. If so, you can replace.
Code: (Demo)
$postText = "It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";
$ellipsis = "...<a class='see-more' href='http://example.com/seemore-link'>read more</a>";
if (($len = strlen($postText)) > 170 && ($pos = strpos($postText, ' ', 170)) && ++$pos < $len){
$postText = substr_replace($postText, $ellipsis, $pos);
}
echo $postText;
The above snippet assumes there are only spaces, in the input string (versus tabs and newline characters which you may want to split on).
There is no need to use preg_split you still can trim the characters with substr.
$postText="It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";
$limit = 170;
$truncated = substr($postText,0,$limit);
$truncated .= "...<a class='see-more' href='http://example.com/seemore-link'>read more</a>";
var_dump($truncated);
Demo
Your regex .{170}\S*\s
is fine but has a little problem. It doesn't guarantee if \S*
matches rest of a word as it may match an MD5 - 170 characters up to first character of MD5 hash then matching 31 more characters which could be more than this.
You are treating those 170 characters as a delimiter of preg_split
, hence you didn't have it in output.
Considering these two things in mind, you may come with a better idea:
$array = preg_split('~^[\s\S]{1,170}+(?(?!\S{10,})\S*)\K~', $string);
PHP live demo
10
ensures there is no non-whitespace characters more than that. If exists it splits right after 170 characters.
Accessing to $array[0]
you could add your read more text to it.