I have a site where users can put in a description about themselves.
Most users write something appropriate but some just copy/paste the same text a number of times (to
I think the approach of finding duplicate words, will be messy. Most likely you'll get duplicate words in real descriptions "I really, really, really, like ice creme, especially vanilla ice creme".
A better approach, is to split the string to get the words, find all the unique words, add all the character counts of the unique words, and set that too some limit. Say, you require 100 character descriptions, require around 60 unique characters from words.
Copying @ficuscr's approach
$words = str_word_count("Love a and peace love a and peace love a and peace love a and peace love a and peace love a and peace", 1);
$total = 0;
foreach ($words as $key => $count) { $total += strlen($key) }