Damerau–Levenshtein distance algorithm in MySQL as a function

后端 未结 3 1235
旧巷少年郎
旧巷少年郎 2021-01-06 12:01

Does anyone know of a MySQL implementation of the Damerau–Levenshtein distance algorithm as a stored procedure/function that takes a single specified string as a parameter a

相关标签:
3条回答
  • 2021-01-06 12:38

    This seems to be an old topic, however should anyone look for a MYSQL implementation of Damerau-Levenshtein distance, here is my own implementation (based upon a simple Levenshtein found elsewhere on this site), which works fine for strings less than 255 characters long. The third parameter can be set to FALSE to retrieve the basic Levenshtein distance:

    CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255), dam BOOL)
    RETURNS INT
    DETERMINISTIC
    BEGIN
        DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
        DECLARE s1_char, s2_char CHAR;
        -- max strlen=255
        DECLARE cv0, cv1, cv2 VARBINARY(256);
        SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
        IF s1 = s2 THEN
            RETURN 0;
        ELSEIF s1_len = 0 THEN
            RETURN s2_len;
        ELSEIF s2_len = 0 THEN
            RETURN s1_len;
        ELSE
            WHILE j <= s2_len DO
                SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
            END WHILE;
            WHILE i <= s1_len DO
                SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
                WHILE j <= s2_len DO
                    SET c = c + 1;
                    SET s2_char = SUBSTRING(s2, j, 1);
                    IF s1_char = s2_char THEN
                        SET cost = 0; ELSE SET cost = 1;
                    END IF;
                    SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
                    IF c > c_temp THEN SET c = c_temp; END IF;
                    SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
                    IF c > c_temp THEN SET c = c_temp; END IF;
                    IF dam THEN
                        IF i>1 AND j>1 AND s1_char = SUBSTRING(s2, j-1, 1) AND s2_char = SUBSTRING(s1, i-1, 1) THEN
                            SET c_temp = CONV(HEX(SUBSTRING(cv2, j-1, 1)), 16, 10) + 1;
                            IF c > c_temp THEN SET c = c_temp; END IF;
                        END IF;
                    END IF;
                    SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
                END WHILE;
                IF dam THEN SET CV2 = CV1; END IF;
                SET cv1 = cv0, i = i + 1;
            END WHILE;
        END IF;
        RETURN c;
    END
    
    0 讨论(0)
  • 2021-01-06 12:42

    In MySQL Levenshtein and Damerau-Levenshtein UDF’s you have several implementations of this algorithm.

    0 讨论(0)
  • 2021-01-06 12:42

    There is an ongoing development in Github to modify Sean Collins code so it has UTF-8 support and is case-insensitive.

    Example:

    mysql> select damlevlim('camión', 'çamion', 6);
    
    +--------------------------------------+
    | damlevlim('camión', 'çamion', 6) |
    +--------------------------------------+
    |                                    0 |
    +--------------------------------------+
    1 row in set (0.00 sec)
    

    This is specially useful when doing fuzzy matches.

    mysql> select word,damlevlim(word, 'camion') as dist from wordslist where damlevlim(word, 'camion', 7)<1 limit 2;
    
    +--------+------+
    | word   | dist |
    +--------+------+
    | camión |    0 |
    | camios |    1 |
    +--------+------+
    2 row in set (0.00 sec)
    
    0 讨论(0)
提交回复
热议问题