Levenshtein distance in T-SQL

后端 未结 6 662
Happy的楠姐
Happy的楠姐 2020-11-22 06:30

I am interested in algorithm in T-SQL calculating Levenshtein distance.

6条回答
  •  终归单人心
    2020-11-22 07:06

    You can use Levenshtein Distance Algorithm for comparing strings

    Here you can find a T-SQL example at http://www.kodyaz.com/articles/fuzzy-string-matching-using-levenshtein-distance-sql-server.aspx

    CREATE FUNCTION edit_distance(@s1 nvarchar(3999), @s2 nvarchar(3999))
    RETURNS int
    AS
    BEGIN
     DECLARE @s1_len int, @s2_len int
     DECLARE @i int, @j int, @s1_char nchar, @c int, @c_temp int
     DECLARE @cv0 varbinary(8000), @cv1 varbinary(8000)
    
     SELECT
      @s1_len = LEN(@s1),
      @s2_len = LEN(@s2),
      @cv1 = 0x0000,
      @j = 1, @i = 1, @c = 0
    
     WHILE @j <= @s2_len
      SELECT @cv1 = @cv1 + CAST(@j AS binary(2)), @j = @j + 1
    
     WHILE @i <= @s1_len
     BEGIN
      SELECT
       @s1_char = SUBSTRING(@s1, @i, 1),
       @c = @i,
       @cv0 = CAST(@i AS binary(2)),
       @j = 1
    
      WHILE @j <= @s2_len
      BEGIN
       SET @c = @c + 1
       SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j-1, 2) AS int) +
        CASE WHEN @s1_char = SUBSTRING(@s2, @j, 1) THEN 0 ELSE 1 END
       IF @c > @c_temp SET @c = @c_temp
       SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j+1, 2) AS int)+1
       IF @c > @c_temp SET @c = @c_temp
       SELECT @cv0 = @cv0 + CAST(@c AS binary(2)), @j = @j + 1
     END
    
     SELECT @cv1 = @cv0, @i = @i + 1
     END
    
     RETURN @c
    END
    

    (Function developped by Joseph Gama)

    Usage :

    select
     dbo.edit_distance('Fuzzy String Match','fuzzy string match'),
     dbo.edit_distance('fuzzy','fuzy'),
     dbo.edit_distance('Fuzzy String Match','fuzy string match'),
     dbo.edit_distance('levenshtein distance sql','levenshtein sql server'),
     dbo.edit_distance('distance','server')
    

    The algorithm simply returns the stpe count to change one string into other by replacing a different character at one step

提交回复
热议问题