PHP explode string with tags using UTF8 between them

后端未结

关注

 4  1792

in php i want to explode string with tag using utf-8 between them, for example, in this text:

$content = \"فهرست اولhi my name is


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2021-01-28 09:38
              
            
            
                                                                       
You can use strpos and Substr to do the same if your UTF is causing issues.  

This will loop till it can't find anymore heading and then add the last Substr after the loop.  

https://3v4l.org/UPfbb

$content = "<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you<heading>فهرست اول</heading>hi my name is mahdi  whats app2 <heading>فهرست دوم</heading>how are you2";

$oldpos =0;
$pos =strpos($content, "<heading>",1); // offset 1 to exclude first heading.

While($pos !== false){
    $arr[] = Substr($content, $oldpos, $pos-$oldpos);
    $oldpos = $pos;
    $pos =strpos($content, "<heading>",$oldpos+1); //offset previous position + 1 to make sure it does not catch the same again 
}
$arr[] = Substr($content, $oldpos); // add last one since it does not have a heading tag after itself.
Var_dump($arr);

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2021-01-28 09:40
              
            
            
                                                                       
You can use preg_split to split the text by a regular expression, then array_filter to remove empty strings:

$arr = array_filter(preg_split('/(?=<heading>.*?<\/heading>)/', $contents), 'strlen');


It won't remove the  tag, since it is in a look-ahead - a group construct that doesn't consume what it matched.

For example:

<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you


This should return:

array(
  [0] => "<heading>فهرست اول</heading>hi my name is mahdi  whats app ",
  [1] => "<heading>فهرست دوم</heading>how are you"
)


You can check this regex online: https://regex101.com/r/ITi7Lh/1

Or, if you prefer, see how PHP parses it: (the link doesn't seem to work on SO, you have to manually paste it): https://en.functions-online.com/preg_split.html?command={"pattern":"\/(?=<heading>.*?<\\\/heading>)\/","subject":"<heading>\u0641\u0647\u0631\u0633\u062a \u0627\u0648\u0644<\/heading>hi my name is mahdi whats app <heading>\u0641\u0647\u0631\u0633\u062a \u062f\u0648\u0645<\/heading>how are you","limit":-1}
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2021-01-28 09:45
              
            
            
                                                                       
You can use preg_match, or in your case, preg_match_all:

$content = "<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you";

preg_match_all("'<heading>.*?<\/heading>'si", $content, $matches);
print_r($matches[0]);


gives:

Array
(
    [0] => <heading>فهرست اول</heading>
    [1] => <heading>فهرست دوم</heading>
)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  [愿得一人]        
                
              
                            
                2021-01-28 09:48
              
            
            
                                                                       
You can try the following function, it should meet your needs well. Basically you should split the array using <heading> as the delimiter, and each item in the resultant array will be what you require, but the heading tag will be stripped since it is what you did your split on, so you need to add it back. There are comments explaining what the code is doing.

function get_what_mahdi_wants($in_string){

  $mahdis_strings_array = array();

  // Split string at occurrences of '<heading>'
  $mahdis_strings = explode('<heading>', $in_string);
  foreach($mahdis_strings as $mahdis_string){

    // if '<heading>' is found at start of string, empty array element will be created. Skip it.
    if($mahdis_string == ''){ continue; }

    // Add back string element with '<heading>' tag prepended since exploding on it stripped it.
    $mahdis_strings_array[] = '<heading>'.$mahdis_string;
  }
  return $mahdis_strings_array;
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复