Count all word including numbers in a php string

前端 未结 8 1199
梦谈多话
梦谈多话 2021-02-15 17:01

To count words in a php string usually we can use str_word_count but I think not always a good solution

good example:

$var =\"Hello world!\";
echo str_         


        
8条回答
  •  天涯浪人
    2021-02-15 17:52

    The most wide-spread method of counting words in a string is by splitting with any kind of whitespace:

    count(preg_split('~\s+~u', trim($text)))
    

    Here, '~\s+~u' splits the whole text with any 1 or more Unicode whitespace characters.

    The disadvantage is that !! is considered a word.

    In case you want to count letter and number words (i.e. strings of text that are only made up of just letters or just numbers) you should consider a preg_match_all like

    if (preg_match_all('~[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?|\d+|(?>\p{L}\p{M}*+)+~u', $text, $matches)) {
        return count($matches[0]);
    }
    

    See the regex demo and the PHP demo:

    $re = '~[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?|\d+|(?>\p{L}\p{M}*+)+~u';
    $text = "The example number 2 is a bad example it will not \ncount numbers  and punctuations !! X is 2.5674.";
    if (preg_match_all($re, $text, $matches)) {
        echo count($matches[0]);
    } // 18 in this string
    

    The [-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)? regex is a well-known integer or float number regex, and (?>\p{L}\p{M}*+)+ matches any 1 or more letters (\p{L}), each of which can be followed with any amount of diacritic marks (\p{M}*+).

    Regex details

    • [-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)? - an optional - or +, 0+ ASCII digits, an optional ., 1+ ASCII digits, an optional sequence of e or E, an optional - or + and then 1+ ASCII digits
    • | - or
    • \d+ - any 1 or more Unicode digits
    • | - or
    • (?>\p{L}\p{M}*+)+ - 1 or more occurrences of any Unicode letter followed with any 0+ diacritic symbols.

    In case you just want to count text chunks consisting of solely digits and letters (with diacritics) mixed up in any order, you may also use

    '~[\p{N}\p{L}\p{M}]+~u'
    

    See another regex demo, \p{M} matches diacritics, \p{N} matches digits and \p{L} matches letters.

提交回复
热议问题