in php i want to explode string with tag using utf-8 between them, for example, in this text:
$content = \"فهرست اول hi my name is
You can use strpos and Substr to do the same if your UTF is causing issues.
This will loop till it can't find anymore heading and then add the last Substr after the loop.
https://3v4l.org/UPfbb
$content = "<heading>فهرست اول</heading>hi my name is mahdi whats app <heading>فهرست دوم</heading>how are you<heading>فهرست اول</heading>hi my name is mahdi whats app2 <heading>فهرست دوم</heading>how are you2";
$oldpos =0;
$pos =strpos($content, "<heading>",1); // offset 1 to exclude first heading.
While($pos !== false){
$arr[] = Substr($content, $oldpos, $pos-$oldpos);
$oldpos = $pos;
$pos =strpos($content, "<heading>",$oldpos+1); //offset previous position + 1 to make sure it does not catch the same again
}
$arr[] = Substr($content, $oldpos); // add last one since it does not have a heading tag after itself.
Var_dump($arr);
You can use preg_split
to split the text by a regular expression, then array_filter
to remove empty strings:
$arr = array_filter(preg_split('/(?=<heading>.*?<\/heading>)/', $contents), 'strlen');
It won't remove the tag, since it is in a look-ahead
- a group construct that doesn't consume what it matched.
For example:
<heading>فهرست اول</heading>hi my name is mahdi whats app <heading>فهرست دوم</heading>how are you
This should return:
array(
[0] => "<heading>فهرست اول</heading>hi my name is mahdi whats app ",
[1] => "<heading>فهرست دوم</heading>how are you"
)
You can check this regex online: https://regex101.com/r/ITi7Lh/1
Or, if you prefer, see how PHP parses it: (the link doesn't seem to work on SO, you have to manually paste it): https://en.functions-online.com/preg_split.html?command={"pattern":"\/(?=<heading>.*?<\\\/heading>)\/","subject":"<heading>\u0641\u0647\u0631\u0633\u062a \u0627\u0648\u0644<\/heading>hi my name is mahdi whats app <heading>\u0641\u0647\u0631\u0633\u062a \u062f\u0648\u0645<\/heading>how are you","limit":-1}
You can use preg_match
, or in your case, preg_match_all
:
$content = "<heading>فهرست اول</heading>hi my name is mahdi whats app <heading>فهرست دوم</heading>how are you";
preg_match_all("'<heading>.*?<\/heading>'si", $content, $matches);
print_r($matches[0]);
gives:
Array
(
[0] => <heading>فهرست اول</heading>
[1] => <heading>فهرست دوم</heading>
)
You can try the following function, it should meet your needs well. Basically you should split the array using <heading>
as the delimiter, and each item in the resultant array will be what you require, but the heading tag will be stripped since it is what you did your split on, so you need to add it back. There are comments explaining what the code is doing.
function get_what_mahdi_wants($in_string){
$mahdis_strings_array = array();
// Split string at occurrences of '<heading>'
$mahdis_strings = explode('<heading>', $in_string);
foreach($mahdis_strings as $mahdis_string){
// if '<heading>' is found at start of string, empty array element will be created. Skip it.
if($mahdis_string == ''){ continue; }
// Add back string element with '<heading>' tag prepended since exploding on it stripped it.
$mahdis_strings_array[] = '<heading>'.$mahdis_string;
}
return $mahdis_strings_array;
}