Say I have the following two strings in my database:
(1) \'Levi Watkins Learning Center - Alabama State University\'
(2) \'ETH Library\'
My sof
You haven't really defined why you think option one is a "closer" match, at least not in any algorithmic sense. It seems like you're basing your expectations on the notion that option one has more matching keywords than option two, so why not just match based on the number of keywords in each string?
For example, using Ruby 2.0:
string1 = 'Levi Watkins Learning Center - Alabama State University'
string2 = 'ETH Library'
strings = [str1, str2]
keywords = 'Alabama University'.split
keycount = {}
# Count matching keywords in each string.
strings.each do |str|
keyword_hits = Hash.new(0)
keywords.each { |word| keyword_hits[word] += str.scan(/#{word}/).count }
keyword_count = keyword_hits.values.reduce :+
keycount[str] = keyword_count
end
# Sort by keyword count, and print results.
keycount.sort.reverse.map { |e| pp "#{e.last}: #{e.first}" }
This will print:
"2: Levi Watkins Learning Center - Alabama State University"
"0: ETH Library"
which matches your expectations of the corpus. You might want to make additional passes on the results using other algorithms to refine the results or to break ties, but this should at least get you pointed in the right direction.