Say I have the following two strings in my database:
(1) \'Levi Watkins Learning Center - Alabama State University\'
(2) \'ETH Library\'
My sof
You can try to use normalized levenshtein distance:
Li Yujian, Liu Bo, "A Normalized Levenshtein Distance Metric," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1091-1095, June 2007, doi:10.1109/TPAMI.2007.1078 http://www.computer.org/csdl/trans/tp/2007/06/i1091-abs.html
They propose to normalize the levenshtein distance. By doing this, a difference of one character in a sequences of longer two weights more than the same difference when comparing sequences of longer 10.