I have on PHP array, for example:
$arr = array(\"hello\", \"try\", \"hel\", \"hey hello\");
Now I want to do rearrange of the array which w
While @yceruto's answer is correct and informative, I would like to extend additional insights and demonstrate more modern implementation syntax.
<=>
from PHP7+First about the generated scores from respective functions...
levenshtein()
and similar_text()
ARE case-sensitive so an uppercase H
is just as much a mismatch as the number 6
when compared to h
.levenshtein()
and similar_text()
ARE NOT multi-byte aware so an accented character like ê
will not only be deemed a mismatch for e
, it will potentially receive a heavier penalty based on each individual byte being a mismatch.If you want to make case-insensitive comparisons, you can simply convert both strings to uppercase/lowercase before executing.
If your application requires multi-byte support, you should search for existing repositories that provide this functionality.
Additional techniques for those willing to research more deeply include metaphone() and soundex(), but I will not delve into these topics in this answer.
Scores:
Test vs "hello" | levenshtein | similar_text | similar_text's percent |
----------------+----------------+----------------+----------------------------|
H3||0 | 5 | 0 | 0 |
Hallo | 2 | 3 | 60 |
aloha | 5 | 2 | 40 |
h | 4 | 1 | 33.333333333333 |
hallo | 1 | 4 | 80 |
hallå | 3 | 3 | 54.545454545455 |
hel | 2 | 3 | 75 |
helicopter | 6 | 4 | 53.333333333333 |
hellacious | 5 | 5 | 66.666666666667 |
hello | 0 | 5 | 100 |
hello y'all | 6 | 5 | 62.5 |
hello yall | 5 | 5 | 66.666666666667 |
helów | 3 | 3 | 54.545454545455 |
hey hello | 4 | 5 | 71.428571428571 |
hola | 3 | 2 | 44.444444444444 |
hêllo | 2 | 4 | 72.727272727273 |
mellow yellow | 9 | 4 | 44.444444444444 |
try | 5 | 0 | 0 |
Sort by levenshtein()
PHP7+ (Demo)
usort($testStrings, function($a, $b) use ($needle) {
return levenshtein($needle, $a) <=> levenshtein($needle, $b);
});
Sort by levenshtein()
PHP7.4+ (Demo)
usort($testStrings, fn($a, $b) => levenshtein($needle, $a) <=> levenshtein($needle, $b));
Notice that $a
and $b
have changed sides of the <=>
evaluation for DESC ordering.
**Notice that hello
is not assured to be positioned as first element
Sort by similar_text()
PHP7+ (Demo)
usort($testStrings, function($a, $b) use ($needle) {
return similar_text($needle, $b) <=> similar_text($needle, $a);
});
Sort by similar_text()
PHP7.4+ (Demo)
usort($testStrings, fn($a, $b) => similar_text($needle, $b) <=> similar_text($needle, $a));
Notice the difference in scoring of hallå
and helicopter
via similar_text()'s return value versus similar_text()'s percent value.
Sort by similar_text()
's percent PHP7+ (Demo)
usort($testStrings, function($a, $b) use ($needle) {
similar_text($needle, $a, $percentA);
similar_text($needle, $b, $percentB);
return $percentB <=> $percentA;
});
Sort by similar_text()
's percent PHP7.4+ (Demo)
usort($testStrings, fn($a, $b) =>
[is_int(similar_text($needle, $b, $percentB)), $percentB]
<=>
[is_int(similar_text($needle, $a, $percentA)), $percentA]
);
Notice that I am neutralizing the unwanted return value of similar_text()
by converting its return value to true
, then using the generated percent
value -- this allows the generation of the percent value without returning too soon since arrow function syntax does not permit multi-line execution.
Sort by levenshtein()
then break ties with similar_text()
PHP7+ (Demo)
usort($testStrings, function($a, $b) use ($needle) {
return [levenshtein($needle, $a), similar_text($needle, $b)]
<=>
[levenshtein($needle, $b), similar_text($needle, $a)];
});
Sort by levenshtein()
then break ties with similar_text()
's percent PHP7.4+ (Demo)
usort($testStrings, fn($a, $b) =>
[levenshtein($needle, $a), similar_text($needle, $b)]
<=>
[levenshtein($needle, $b), similar_text($needle, $a)]
);
Personally, I never use anything but levenshtein()
in my projects because it consistently delivers the results that I'm looking for.
if you want to sort your array, you can do this:
$arr = array("hello", "try", "hel", "hey hello");
$search = "hey"; //your search var
for($i=0; $i<count($arr); $i++) {
$temp_arr[$i] = levenshtein($search, $arr[$i]);
}
asort($temp_arr);
foreach($temp_arr as $k => $v) {
$sorted_arr[] = $arr[$k];
}
$sorted_arr
should then be in descending order starting with the closest word to your search term.
You can use levenshtein function
<?php
// input misspelled word
$input = 'helllo';
// array of words to check against
$words = array('hello' 'try', 'hel', 'hey hello');
// no shortest distance found, yet
$shortest = -1;
// loop through words to find the closest
foreach ($words as $word) {
// calculate the distance between the input word,
// and the current word
$lev = levenshtein($input, $word);
// check for an exact match
if ($lev == 0) {
// closest word is this one (exact match)
$closest = $word;
$shortest = 0;
// break out of the loop; we've found an exact match
break;
}
// if this distance is less than the next found shortest
// distance, OR if a next shortest word has not yet been found
if ($lev <= $shortest || $shortest < 0) {
// set the closest match, and shortest distance
$closest = $word;
$shortest = $lev;
}
}
echo "Input word: $input\n";
if ($shortest == 0) {
echo "Exact match found: $closest\n";
} else {
echo "Did you mean: $closest?\n";
}
?>
This is a quick solution by using http://php.net/manual/en/function.similar-text.php:
This calculates the similarity between two strings as described in Programming Classics: Implementing the World's Best Algorithms by Oliver (ISBN 0-131-00413-1). Note that this implementation does not use a stack as in Oliver's pseudo code, but recursive calls which may or may not speed up the whole process. Note also that the complexity of this algorithm is O(N**3) where N is the length of the longest string.
$userInput = 'Bradley123';
$list = array('Bob', 'Brad', 'Britney');
usort($list, function ($a, $b) use ($userInput) {
similar_text($userInput, $a, $percentA);
similar_text($userInput, $b, $percentB);
return $percentA === $percentB ? 0 : ($percentA > $percentB ? -1 : 1);
});
var_dump($list); //output: array("Brad", "Britney", "Bob");
Or using http://php.net/manual/en/function.levenshtein.php:
The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2. The complexity of the algorithm is O(m*n), where n and m are the length of str1 and str2 (rather good when compared to similar_text(), which is O(max(n,m)**3), but still expensive).
$userInput = 'Bradley123';
$list = array('Bob', 'Brad', 'Britney');
usort($list, function ($a, $b) use ($userInput) {
$levA = levenshtein($userInput, $a);
$levB = levenshtein($userInput, $b);
return $levA === $levB ? 0 : ($levA > $levB ? 1 : -1);
});
var_dump($list); //output: array("Britney", "Brad", "Bob");
Another way is to use similar_text function which returns result in percents. See more http://www.php.net/manual/en/function.similar-text.php .