How to sort an array by similarity in relation to an inputted word.

前端 未结 5 918
闹比i
闹比i 2020-12-05 11:21

I have on PHP array, for example:

$arr = array(\"hello\", \"try\", \"hel\", \"hey hello\");

Now I want to do rearrange of the array which w

相关标签:
5条回答
  • 2020-12-05 11:39

    While @yceruto's answer is correct and informative, I would like to extend additional insights and demonstrate more modern implementation syntax.

    • The three-way comparison operator (aka "spaceship operator") <=> from PHP7+
    • Arrow function syntax to allow extra variables into the custom function scope from PHP7.4+.

    First about the generated scores from respective functions...

    1. levenshtein() and similar_text() ARE case-sensitive so an uppercase H is just as much a mismatch as the number 6 when compared to h.
    2. levenshtein() and similar_text() ARE NOT multi-byte aware so an accented character like ê will not only be deemed a mismatch for e, it will potentially receive a heavier penalty based on each individual byte being a mismatch.

    If you want to make case-insensitive comparisons, you can simply convert both strings to uppercase/lowercase before executing.

    If your application requires multi-byte support, you should search for existing repositories that provide this functionality.

    Additional techniques for those willing to research more deeply include metaphone() and soundex(), but I will not delve into these topics in this answer.

    Scores:

    Test vs "hello" |  levenshtein   |  similar_text  |   similar_text's percent   |
    ----------------+----------------+----------------+----------------------------|
    H3||0           |       5        |      0         |       0                    |
    Hallo           |       2        |      3         |      60                    |
    aloha           |       5        |      2         |      40                    |
    h               |       4        |      1         |      33.333333333333       |
    hallo           |       1        |      4         |      80                    |
    hallå           |       3        |      3         |      54.545454545455       |
    hel             |       2        |      3         |      75                    |
    helicopter      |       6        |      4         |      53.333333333333       |
    hellacious      |       5        |      5         |      66.666666666667       |
    hello           |       0        |      5         |     100                    |
    hello y'all     |       6        |      5         |      62.5                  |
    hello yall      |       5        |      5         |      66.666666666667       |
    helów           |       3        |      3         |      54.545454545455       |
    hey hello       |       4        |      5         |      71.428571428571       |
    hola            |       3        |      2         |      44.444444444444       |
    hêllo           |       2        |      4         |      72.727272727273       |
    mellow yellow   |       9        |      4         |      44.444444444444       |
    try             |       5        |      0         |       0                    |
    

    Sort by levenshtein() PHP7+ (Demo)

    usort($testStrings, function($a, $b) use ($needle) {
        return levenshtein($needle, $a) <=> levenshtein($needle, $b);
    });
    

    Sort by levenshtein() PHP7.4+ (Demo)

    usort($testStrings, fn($a, $b) => levenshtein($needle, $a) <=> levenshtein($needle, $b));
    

    Notice that $a and $b have changed sides of the <=> evaluation for DESC ordering. **Notice that hello is not assured to be positioned as first element

    Sort by similar_text() PHP7+ (Demo)

    usort($testStrings, function($a, $b) use ($needle) {
        return similar_text($needle, $b) <=> similar_text($needle, $a);
    });
    

    Sort by similar_text() PHP7.4+ (Demo)

    usort($testStrings, fn($a, $b) => similar_text($needle, $b) <=> similar_text($needle, $a));
    

    Notice the difference in scoring of hallå and helicopter via similar_text()'s return value versus similar_text()'s percent value.

    Sort by similar_text()'s percent PHP7+ (Demo)

    usort($testStrings, function($a, $b) use ($needle) {
        similar_text($needle, $a, $percentA);
        similar_text($needle, $b, $percentB);
        return $percentB <=> $percentA;
    });
    

    Sort by similar_text()'s percent PHP7.4+ (Demo)

    usort($testStrings, fn($a, $b) => 
        [is_int(similar_text($needle, $b, $percentB)), $percentB]
        <=>
        [is_int(similar_text($needle, $a, $percentA)), $percentA]
    );
    

    Notice that I am neutralizing the unwanted return value of similar_text() by converting its return value to true, then using the generated percent value -- this allows the generation of the percent value without returning too soon since arrow function syntax does not permit multi-line execution.


    Sort by levenshtein() then break ties with similar_text() PHP7+ (Demo)

    usort($testStrings, function($a, $b) use ($needle) {
        return [levenshtein($needle, $a), similar_text($needle, $b)]
               <=>
               [levenshtein($needle, $b), similar_text($needle, $a)];
    });
    

    Sort by levenshtein() then break ties with similar_text()'s percent PHP7.4+ (Demo)

    usort($testStrings, fn($a, $b) =>
        [levenshtein($needle, $a), similar_text($needle, $b)]
        <=>
        [levenshtein($needle, $b), similar_text($needle, $a)]
    );
    

    Personally, I never use anything but levenshtein() in my projects because it consistently delivers the results that I'm looking for.

    0 讨论(0)
  • 2020-12-05 11:40

    if you want to sort your array, you can do this:

    $arr = array("hello", "try", "hel", "hey hello");
    $search = "hey"; //your search var
    
    for($i=0; $i<count($arr); $i++) {
       $temp_arr[$i] = levenshtein($search, $arr[$i]);
    }
    asort($temp_arr);
    foreach($temp_arr as $k => $v) {
        $sorted_arr[] = $arr[$k];
    }
    

    $sorted_arr should then be in descending order starting with the closest word to your search term.

    0 讨论(0)
  • 2020-12-05 11:45

    You can use levenshtein function

    <?php
    // input misspelled word
    $input = 'helllo';
    
    // array of words to check against
    $words  = array('hello' 'try', 'hel', 'hey hello');
    
    // no shortest distance found, yet
    $shortest = -1;
    
    // loop through words to find the closest
    foreach ($words as $word) {
    
        // calculate the distance between the input word,
        // and the current word
        $lev = levenshtein($input, $word);
    
        // check for an exact match
        if ($lev == 0) {
    
            // closest word is this one (exact match)
            $closest = $word;
            $shortest = 0;
    
            // break out of the loop; we've found an exact match
            break;
        }
    
        // if this distance is less than the next found shortest
        // distance, OR if a next shortest word has not yet been found
        if ($lev <= $shortest || $shortest < 0) {
            // set the closest match, and shortest distance
            $closest  = $word;
            $shortest = $lev;
        }
    }
    
    echo "Input word: $input\n";
    if ($shortest == 0) {
        echo "Exact match found: $closest\n";
    } else {
        echo "Did you mean: $closest?\n";
    }
    
    ?>
    
    0 讨论(0)
  • 2020-12-05 11:51

    This is a quick solution by using http://php.net/manual/en/function.similar-text.php:

    This calculates the similarity between two strings as described in Programming Classics: Implementing the World's Best Algorithms by Oliver (ISBN 0-131-00413-1). Note that this implementation does not use a stack as in Oliver's pseudo code, but recursive calls which may or may not speed up the whole process. Note also that the complexity of this algorithm is O(N**3) where N is the length of the longest string.

    $userInput = 'Bradley123';
    
    $list = array('Bob', 'Brad', 'Britney');
    
    usort($list, function ($a, $b) use ($userInput) {
        similar_text($userInput, $a, $percentA);
        similar_text($userInput, $b, $percentB);
    
        return $percentA === $percentB ? 0 : ($percentA > $percentB ? -1 : 1);
    });
    
    var_dump($list); //output: array("Brad", "Britney", "Bob");
    

    Or using http://php.net/manual/en/function.levenshtein.php:

    The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2. The complexity of the algorithm is O(m*n), where n and m are the length of str1 and str2 (rather good when compared to similar_text(), which is O(max(n,m)**3), but still expensive).

    $userInput = 'Bradley123';
    
    $list = array('Bob', 'Brad', 'Britney');
    
    usort($list, function ($a, $b) use ($userInput) {
        $levA = levenshtein($userInput, $a);
        $levB = levenshtein($userInput, $b);
    
        return $levA === $levB ? 0 : ($levA > $levB ? 1 : -1);
    });
    
    var_dump($list); //output: array("Britney", "Brad", "Bob");
    
    0 讨论(0)
  • 2020-12-05 12:01

    Another way is to use similar_text function which returns result in percents. See more http://www.php.net/manual/en/function.similar-text.php .

    0 讨论(0)
提交回复
热议问题