Count all word including numbers in a php string

前端未结

关注

 8  1199

梦谈多话 2021-02-15 17:01

To count words in a php string usually we can use str_word_count but I think not always a good solution

good example:

$var =\"Hello world!\";
echo str_


      
      
        
          8条回答        

        
                    
            
            
                         
                
              
              
                
                   天涯浪人
                                             
                
                
                (楼主)
            
              
              
                2021-02-15 17:52
              

            
            
                        
The most wide-spread method of counting words in a string is by splitting with any kind of whitespace:

count(preg_split('~\s+~u', trim($text)))


Here, '~\s+~u' splits the whole text with any 1 or more Unicode whitespace characters.

The disadvantage is that !! is considered a word.

In case you want to count letter and number words (i.e. strings of text that are only made up of just letters or just numbers) you should consider a preg_match_all like

if (preg_match_all('~[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?|\d+|(?>\p{L}\p{M}*+)+~u', $text, $matches)) {
    return count($matches[0]);
}


See the regex demo and the PHP demo:

$re = '~[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?|\d+|(?>\p{L}\p{M}*+)+~u';
$text = "The example number 2 is a bad example it will not \ncount numbers  and punctuations !! X is 2.5674.";
if (preg_match_all($re, $text, $matches)) {
    echo count($matches[0]);
} // 18 in this string


The [-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)? regex is a well-known integer or float number regex, and (?>\p{L}\p{M}*+)+ matches any 1 or more letters (\p{L}), each of which can be followed with any amount of diacritic marks (\p{M}*+).

Regex details


[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)? - an optional - or +, 0+ ASCII digits, an optional ., 1+ ASCII digits, an optional sequence of e or E, an optional - or + and then 1+ ASCII digits
| - or
\d+ - any 1 or more Unicode digits
| - or
(?>\p{L}\p{M}*+)+ - 1 or more occurrences of any Unicode letter followed with any 0+ diacritic symbols.


In case you just want to count text chunks consisting of solely digits and letters (with diacritics) mixed up in any order, you may also use 

'~[\p{N}\p{L}\p{M}]+~u'


See another regex demo, \p{M} matches diacritics, \p{N} matches digits and \p{L} matches letters.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它8个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复