I have a dictionary of 50K to 100K strings (can be up to 50+ characters) and I am trying to find whether a given string is in the dictionary with some \"edit\" distance toleranc
I wrote a pair of gems, fuzzily and blurrily which do trigrams-based fuzzy matching. Given your (low) volume of data Fuzzily will be easier to integrate and about as fast, in with either you'd get answers within 5-10ms on modern hardware.
Given both are trigrams-based (which is indexable), not edit-distance-based (which isn't), you'd probably have to do this in two passes:
In Ruby (as you asked), using Fuzzily + the Text gem, obtaining the records withing the edit distance threshold would look like:
MyRecords.find_by_fuzzy_name(input_string).select { |result|
Text::Levenshtein.distance(input_string, result.name)] < my_distance_threshold
}
This performas a handful of well optimized database queries and a few
Caveats: