I have an implementation of the jaro-winkler algorithm in my database. I did not write this function. The function compares two values and gives the probability of match.
So jaro(string1, string2, matchnoofchars) will return a result.
Instead of comparing two strings, I want to send one string with a matchnoofchars and then get a result set with the probability higher than 95%.
For example the current function is able to return 97.62% for jaro("Philadelphia","Philadelphlaa",9)
I wish to tweak this function so that I am able to find "Philadelphia" for an input of "Philadelphlaa". What kind of changes do I need to make for this to happen?
I am using Oracle 9i.
Do you have a list of words that contain words like "Philadelphia"?
And who did write that function?
Oracle has package utl_match for fuzzy text comparison: http://download.oracle.com/docs/cd/E14072_01/appdev.112/e10577/u_match.htm
Can't you do
select w1.word from words w1 where jaro(w1.word,'Philadelphlaa', 9) >= 0.95
?
This will select 'Philadelphia' if that word is present in table words.
A little dirty but faster (untested!).
Let's assume first three characters are the same and length is also approximately the same.
DECLARE
CURSOR citynames(cp_start in varchar2, cp_length in number) IS
SELECT city FROM table_loc_master where statecode = 'PQ'
and city like cp_start||'%'
and length(city) between cp_length -2 and cp_length +2;
CURSOR leasecity IS
SELECT city FROM table_loc where State = 'PQ'
MINUS
SELECT to_char(city) city FROM table_loc_master where statecode = 'PQ';
xProb NUMBER(10,8);
BEGIN
FOR x_rec IN leasecity
LOOP
FOR y_rec IN citynames(substr(x_rec.city,1,3), length(x_rec.city))
LOOP
xProb := jwrun(x_rec.city,y_rec.city,length(y_rec.city));
If xProb > 0.97 Then
DBMS_OUTPUT.PUT_LINE('Source : ' || x_rec.city || ' Target: ' || y_rec.city );
End if;
END LOOP;
END LOOP;
END;
DECLARE
CURSOR citynames IS
SELECT city FROM table_loc_master where statecode = 'PQ';
CURSOR leasecity IS
SELECT city FROM table_loc where State = 'PQ'
MINUS
SELECT to_char(city) city FROM table_loc_master where statecode = 'PQ';
xProb NUMBER(10,8);
BEGIN
FOR x_rec IN leasecity
LOOP
FOR y_rec IN citynames
LOOP
xProb := jwrun(x_rec.city,y_rec.city,length(y_rec.city));
If xProb > 0.97 Then
DBMS_OUTPUT.PUT_LINE('Source : ' || x_rec.city || ' Target: ' || y_rec.city );
End if;
END LOOP;
END LOOP;
END;
来源:https://stackoverflow.com/questions/3585246/how-can-i-use-jaro-winkler-to-find-the-closest-value-in-a-table