To count words in a php string usually we can use str_word_count but I think not always a good solution
$var =\"Hello world!\";
echo str_
The most wide-spread method of counting words in a string is by splitting with any kind of whitespace:
count(preg_split('~\s+~u', trim($text)))
Here, '~\s+~u'
splits the whole text with any 1 or more Unicode whitespace characters.
The disadvantage is that !!
is considered a word.
In case you want to count letter and number words (i.e. strings of text that are only made up of just letters or just numbers) you should consider a preg_match_all
like
if (preg_match_all('~[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?|\d+|(?>\p{L}\p{M}*+)+~u', $text, $matches)) {
return count($matches[0]);
}
See the regex demo and the PHP demo:
$re = '~[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?|\d+|(?>\p{L}\p{M}*+)+~u';
$text = "The example number 2 is a bad example it will not \ncount numbers and punctuations !! X is 2.5674.";
if (preg_match_all($re, $text, $matches)) {
echo count($matches[0]);
} // 18 in this string
The [-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?
regex is a well-known integer or float number regex, and (?>\p{L}\p{M}*+)+
matches any 1 or more letters (\p{L}
), each of which can be followed with any amount of diacritic marks (\p{M}*+
).
Regex details
[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?
- an optional -
or +
, 0+ ASCII digits, an optional .
, 1+ ASCII digits, an optional sequence of e
or E
, an optional -
or +
and then 1+ ASCII digits|
- or\d+
- any 1 or more Unicode digits|
- or(?>\p{L}\p{M}*+)+
- 1 or more occurrences of any Unicode letter followed with any 0+ diacritic symbols.In case you just want to count text chunks consisting of solely digits and letters (with diacritics) mixed up in any order, you may also use
'~[\p{N}\p{L}\p{M}]+~u'
See another regex demo, \p{M}
matches diacritics, \p{N}
matches digits and \p{L}
matches letters.