How can I use jaro-winkler to find the closest value in a table?

前端 未结 3 1483
臣服心动
臣服心动 2021-01-22 09:51

I have an implementation of the jaro-winkler algorithm in my database. I did not write this function. The function compares two values and gives the probability of match.

相关标签:
3条回答
  • 2021-01-22 09:53

    A little dirty but faster (untested!).

    Let's assume first three characters are the same and length is also approximately the same.

    DECLARE
      CURSOR citynames(cp_start in varchar2, cp_length in number) IS
        SELECT city FROM table_loc_master where statecode = 'PQ'
        and   city like cp_start||'%'
        and   length(city) between cp_length -2 and cp_length +2;
      CURSOR leasecity IS
        SELECT city FROM table_loc where State = 'PQ'
        MINUS
        SELECT to_char(city) city FROM table_loc_master where statecode = 'PQ';
      xProb NUMBER(10,8);
    BEGIN
      FOR x_rec IN leasecity
      LOOP
          FOR y_rec IN citynames(substr(x_rec.city,1,3), length(x_rec.city))
          LOOP
                xProb := jwrun(x_rec.city,y_rec.city,length(y_rec.city));
                If xProb > 0.97 Then
                   DBMS_OUTPUT.PUT_LINE('Source : ' || x_rec.city || ' Target: ' || y_rec.city );
                End if;
          END LOOP;
      END LOOP;
    END;
    
    0 讨论(0)
  • 2021-01-22 09:57

    Do you have a list of words that contain words like "Philadelphia"?

    And who did write that function?

    Oracle has package utl_match for fuzzy text comparison: http://download.oracle.com/docs/cd/E14072_01/appdev.112/e10577/u_match.htm

    Can't you do

    select w1.word from words w1 where jaro(w1.word,'Philadelphlaa', 9) >= 0.95

    ?

    This will select 'Philadelphia' if that word is present in table words.

    0 讨论(0)
  • 2021-01-22 10:05
    DECLARE
      CURSOR citynames IS
        SELECT city FROM table_loc_master where statecode = 'PQ';
      CURSOR leasecity IS
        SELECT city FROM table_loc where State = 'PQ'
        MINUS
        SELECT to_char(city) city FROM table_loc_master where statecode = 'PQ';
      xProb NUMBER(10,8);
    BEGIN
      FOR x_rec IN leasecity
      LOOP
          FOR y_rec IN citynames
          LOOP
                xProb := jwrun(x_rec.city,y_rec.city,length(y_rec.city));
                If xProb > 0.97 Then
                   DBMS_OUTPUT.PUT_LINE('Source : ' || x_rec.city || ' Target: ' || y_rec.city );
                End if;
          END LOOP;
      END LOOP;
    END;
    
    0 讨论(0)
提交回复
热议问题