MySQL SubString Fuzzy Search

前提是你 提交于 2019-12-12 02:09:50

问题


I have a very interesting problem:

I have a MySQL table 'Venue' with fields: 'name', 'addressLine1', 'addressLine2', 'addressLine3', 'city', 'country', 'description'; all fields are VARCHAR. 'description' is a larger text field.

What I would like to do is a fuzzy search on table Venue. So far I am using:

SELECT * FROM Venue WHERE MATCH(name, addressLine1,..., description) AGAINST("London" IN NATURAL LANGUAGE MODE).

I can also sort this query based on the MATCH score.

This is great but has some obvious problems: 1) if the user types "lond" then nothing is returned 2) if the user types "lodnod" then nothing is returned.

One alternative solution I thought of would be to use a Levenshtein expansion (2 movements) - so for the search term "lodnod" the query would be like:

SELECT * FROM Venue WHERE Venue. name LIKE '%__lodnod%'
OR Venue.addressLine1 LIKE '%__lodnod%'
OR Venue.addressLine2 LIKE '%__lodnod%'
OR Venue.addressLine3 LIKE '%__lodnod%'
OR Venue.city LIKE '%__lodnod%'
OR Venue.county LIKE '%__lodnod%'
OR Venue.country LIKE '%__lodnod%'
OR Venue. name LIKE '%lodnod%'
OR Venue.addressLine1 LIKE '%lodnod%'
OR Venue.addressLine2 LIKE '%lodnod%'
OR Venue.addressLine3 LIKE '%lodnod%'
OR Venue.city LIKE '%lodnod%'
OR Venue.county LIKE '%lodnod%'
OR Venue.country LIKE '%lodnod%'
OR Venue. name LIKE '%_lodnod%'
OR Venue.addressLine1 LIKE '%_lodnod%'
OR Venue.addressLine2 LIKE '%_lodnod%'
OR Venue.addressLine3 LIKE '%_lodnod%'
OR Venue.city LIKE '%_lodnod%'
OR Venue.county LIKE '%_lodnod%'
OR Venue.country LIKE '%_lodnod%'
OR Venue. name LIKE '%_odnod%'
OR Venue.addressLine1 LIKE '%_odnod%'
OR Venue.addressLine2 LIKE '%_odnod%'
OR Venue.addressLine3 LIKE '%_odnod%'
OR Venue.city LIKE '%_odnod%'
OR Venue.county LIKE '%_odnod%'
OR Venue.country LIKE '%_odnod%'
OR Venue. name LIKE '%__odnod%'
OR Venue.addressLine1 LIKE '%__odnod%'
OR Venue.addressLine2 LIKE '%__odnod%'
OR Venue.addressLine3 LIKE '%__odnod%'
OR Venue.city LIKE '%__odnod%'
OR Venue.county LIKE '%__odnod%'
OR Venue.country LIKE '%__odnod%'
OR Venue. name LIKE '%_l_odnod%'
OR Venue.addressLine1 LIKE '%_l_odnod%'
OR Venue.addressLine2 LIKE '%_l_odnod%'
OR Venue.addressLine3 LIKE '%_l_odnod%'
OR Venue.city LIKE '%_l_odnod%'
OR Venue.county LIKE '%_l_odnod%'
OR Venue.country LIKE '%_l_odnod%'
OR Venue. name LIKE '%_ldnod%'
OR Venue.addressLine1 LIKE '%_ldnod%'
OR Venue.addressLine2 LIKE '%_ldnod%'
OR Venue.addressLine3 LIKE '%_ldnod%'
OR Venue.city LIKE '%_ldnod%'
OR Venue.county LIKE '%_ldnod%'
OR Venue.country LIKE '%_ldnod%'
OR Venue. name LIKE '%_l_dnod%'
OR Venue.addressLine1 LIKE '%_l_dnod%'
OR Venue.addressLine2 LIKE '%_l_dnod%'
OR Venue.addressLine3 LIKE '%_l_dnod%'
OR Venue.city LIKE '%_l_dnod%'
OR Venue.county LIKE '%_l_dnod%'
OR Venue.country LIKE '%_l_dnod%'
OR Venue. name LIKE '%_lo_dnod%'
OR Venue.addressLine1 LIKE '%_lo_dnod%'
OR Venue.addressLine2 LIKE '%_lo_dnod%'
OR Venue.addressLine3 LIKE '%_lo_dnod%'
OR Venue.city LIKE '%_lo_dnod%'
OR Venue.county LIKE '%_lo_dnod%'
OR Venue.country LIKE '%_lo_dnod%'
OR Venue. name LIKE '%_lonod%'
OR Venue.addressLine1 LIKE '%_lonod%'
OR Venue.addressLine2 LIKE '%_lonod%'
OR Venue.addressLine3 LIKE '%_lonod%'
OR Venue.city LIKE '%_lonod%'
OR Venue.county LIKE '%_lonod%'
OR Venue.country LIKE '%_lonod%'
OR Venue. name LIKE '%_lo_nod%'
OR Venue.addressLine1 LIKE '%_lo_nod%'
OR Venue.addressLine2 LIKE '%_lo_nod%'
OR Venue.addressLine3 LIKE '%_lo_nod%'
OR Venue.city LIKE '%_lo_nod%'
OR Venue.county LIKE '%_lo_nod%'
OR Venue.country LIKE '%_lo_nod%'
OR Venue. name LIKE '%_lod_nod%'
OR Venue.addressLine1 LIKE '%_lod_nod%'
OR Venue.addressLine2 LIKE '%_lod_nod%'
OR Venue.addressLine3 LIKE '%_lod_nod%'
OR Venue.city LIKE '%_lod_nod%'
OR Venue.county LIKE '%_lod_nod%'
OR Venue.country LIKE '%_lod_nod%'
OR Venue. name LIKE '%_lodod%'
OR Venue.addressLine1 LIKE '%_lodod%'
OR Venue.addressLine2 LIKE '%_lodod%'
OR Venue.addressLine3 LIKE '%_lodod%'
OR Venue.city LIKE '%_lodod%'
OR Venue.county LIKE '%_lodod%'
OR Venue.country LIKE '%_lodod%'
OR Venue. name LIKE '%_lod_od%'
OR Venue.addressLine1 LIKE '%_lod_od%'
OR Venue.addressLine2 LIKE '%_lod_od%'
OR Venue.addressLine3 LIKE '%_lod_od%'
OR Venue.city LIKE '%_lod_od%'
OR Venue.county LIKE '%_lod_od%'
OR Venue.country LIKE '%_lod_od%'
OR Venue. name LIKE '%_lodn_od%'
OR Venue.addressLine1 LIKE '%_lodn_od%'
OR Venue.addressLine2 LIKE '%_lodn_od%'
OR Venue.addressLine3 LIKE '%_lodn_od%'
OR Venue.city LIKE '%_lodn_od%'
OR Venue.county LIKE '%_lodn_od%'
OR Venue.country LIKE '%_lodn_od%'
OR Venue. name LIKE '%_lodnd%'
OR Venue.addressLine1 LIKE '%_lodnd%'
OR Venue.addressLine2 LIKE '%_lodnd%'
OR Venue.addressLine3 LIKE '%_lodnd%'
OR Venue.city LIKE '%_lodnd%'
OR Venue.county LIKE '%_lodnd%'
OR Venue.country LIKE '%_lodnd%'
OR Venue. name LIKE '%_lodn_d%'
OR Venue.addressLine1 LIKE '%_lodn_d%'
OR Venue.addressLine2 LIKE '%_lodn_d%'
OR Venue.addressLine3 LIKE '%_lodn_d%'
OR Venue.city LIKE '%_lodn_d%'
OR Venue.county LIKE '%_lodn_d%'
OR Venue.country LIKE '%_lodn_d%'
OR Venue. name LIKE '%_lodno_d%'
OR Venue.addressLine1 LIKE '%_lodno_d%'
OR Venue.addressLine2 LIKE '%_lodno_d%'
OR Venue.addressLine3 LIKE '%_lodno_d%'
OR Venue.city LIKE '%_lodno_d%'
OR Venue.county LIKE '%_lodno_d%'
OR Venue.country LIKE '%_lodno_d%'
OR Venue. name LIKE '%_lodno%'
OR Venue.addressLine1 LIKE '%_lodno%'
OR Venue.addressLine2 LIKE '%_lodno%'
OR Venue.addressLine3 LIKE '%_lodno%'
OR Venue.city LIKE '%_lodno%'
OR Venue.county LIKE '%_lodno%'
OR Venue.country LIKE '%_lodno%'
OR Venue. name LIKE '%_lodno_%'
OR Venue.addressLine1 LIKE '%_lodno_%'
OR Venue.addressLine2 LIKE '%_lodno_%'
OR Venue.addressLine3 LIKE '%_lodno_%'
OR Venue.city LIKE '%_lodno_%'
OR Venue.county LIKE '%_lodno_%'
OR Venue.country LIKE '%_lodno_%'
OR Venue. name LIKE '%_lodnod_%'
OR Venue.addressLine1 LIKE '%_lodnod_%'
OR Venue.addressLine2 LIKE '%_lodnod_%'
OR Venue.addressLine3 LIKE '%_lodnod_%'
OR Venue.city LIKE '%_lodnod_%'
OR Venue.county LIKE '%_lodnod_%'
OR Venue.country LIKE '%_lodnod_%'
OR Venue. name LIKE '%dnod%'
OR Venue.addressLine1 LIKE '%dnod%'
OR Venue.addressLine2 LIKE '%dnod%'
OR Venue.addressLine3 LIKE '%dnod%'
OR Venue.city LIKE '%dnod%'
OR Venue.county LIKE '%dnod%'
OR Venue.country LIKE '%dnod%'
OR Venue. name LIKE '%_dnod%'
OR Venue.addressLine1 LIKE '%_dnod%'
OR Venue.addressLine2 LIKE '%_dnod%'
OR Venue.addressLine3 LIKE '%_dnod%'
OR Venue.city LIKE '%_dnod%'
OR Venue.county LIKE '%_dnod%'
OR Venue.country LIKE '%_dnod%'
OR Venue. name LIKE '%o_dnod%'
OR Venue.addressLine1 LIKE '%o_dnod%'
OR Venue.addressLine2 LIKE '%o_dnod%'
OR Venue.addressLine3 LIKE '%o_dnod%'
OR Venue.city LIKE '%o_dnod%'
OR Venue.county LIKE '%o_dnod%'
OR Venue.country LIKE '%o_dnod%'
OR Venue. name LIKE '%onod%'
OR Venue.addressLine1 LIKE '%onod%'
OR Venue.addressLine2 LIKE '%onod%'
OR Venue.addressLine3 LIKE '%onod%'
OR Venue.city LIKE '%onod%'
OR Venue.county LIKE '%onod%'
OR Venue.country LIKE '%onod%'
OR Venue. name LIKE '%o_nod%'
OR Venue.addressLine1 LIKE '%o_nod%'
OR Venue.addressLine2 LIKE '%o_nod%'
OR Venue.addressLine3 LIKE '%o_nod%'
OR Venue.city LIKE '%o_nod%'
OR Venue.county LIKE '%o_nod%'
OR Venue.country LIKE '%o_nod%'
OR Venue. name LIKE '%od_nod%'
OR Venue.addressLine1 LIKE '%od_nod%'
OR Venue.addressLine2 LIKE '%od_nod%'
OR Venue.addressLine3 LIKE '%od_nod%'
OR Venue.city LIKE '%od_nod%'
OR Venue.county LIKE '%od_nod%'
OR Venue.country LIKE '%od_nod%'
OR Venue. name LIKE '%odod%'
OR Venue.addressLine1 LIKE '%odod%'
OR Venue.addressLine2 LIKE '%odod%'
OR Venue.addressLine3 LIKE '%odod%'
OR Venue.city LIKE '%odod%'
OR Venue.county LIKE '%odod%'
OR Venue.country LIKE '%odod%'
OR Venue. name LIKE '%od_od%'
OR Venue.addressLine1 LIKE '%od_od%'
OR Venue.addressLine2 LIKE '%od_od%'
OR Venue.addressLine3 LIKE '%od_od%'
OR Venue.city LIKE '%od_od%'
OR Venue.county LIKE '%od_od%'
OR Venue.country LIKE '%od_od%'
OR Venue. name LIKE '%odn_od%'
OR Venue.addressLine1 LIKE '%odn_od%'
OR Venue.addressLine2 LIKE '%odn_od%'
OR Venue.addressLine3 LIKE '%odn_od%'
OR Venue.city LIKE '%odn_od%'
OR Venue.county LIKE '%odn_od%'
OR Venue.country LIKE '%odn_od%'
OR Venue. name LIKE '%odnd%'
OR Venue.addressLine1 LIKE '%odnd%'
OR Venue.addressLine2 LIKE '%odnd%'
OR Venue.addressLine3 LIKE '%odnd%'
OR Venue.city LIKE '%odnd%'
OR Venue.county LIKE '%odnd%'
OR Venue.country LIKE '%odnd%'
OR Venue. name LIKE '%odn_d%'
OR Venue.addressLine1 LIKE '%odn_d%'
OR Venue.addressLine2 LIKE '%odn_d%'
...(cut short because of maximum 30000 character limit)
OR Venue.city LIKE '%lodno_%'
OR Venue.county LIKE '%lodno_%'
OR Venue.country LIKE '%lodno_%'
OR Venue. name LIKE '%lodno__%'
OR Venue.addressLine1 LIKE '%lodno__%'
OR Venue.addressLine2 LIKE '%lodno__%'
OR Venue.addressLine3 LIKE '%lodno__%'
OR Venue.city LIKE '%lodno__%'
OR Venue.county LIKE '%lodno__%'
OR Venue.country LIKE '%lodno__%'
OR Venue. name LIKE '%lodno_d_%'
OR Venue.addressLine1 LIKE '%lodno_d_%'
OR Venue.addressLine2 LIKE '%lodno_d_%'
OR Venue.addressLine3 LIKE '%lodno_d_%'
OR Venue.city LIKE '%lodno_d_%'
OR Venue.county LIKE '%lodno_d_%'
OR Venue.country LIKE '%lodno_d_%'
OR Venue. name LIKE '%lodno%'
OR Venue.addressLine1 LIKE '%lodno%'
OR Venue.addressLine2 LIKE '%lodno%'
OR Venue.addressLine3 LIKE '%lodno%'
OR Venue.city LIKE '%lodno%'
OR Venue.county LIKE '%lodno%'
OR Venue.country LIKE '%lodno%'
OR Venue. name LIKE '%lodnod__%'
OR Venue.addressLine1 LIKE '%lodnod__%'
OR Venue.addressLine2 LIKE '%lodnod__%'
OR Venue.addressLine3 LIKE '%lodnod__%'
OR Venue.city LIKE '%lodnod__%'
OR Venue.county LIKE '%lodnod__%'
OR Venue.country LIKE '%lodnod__%'
OR Venue. name LIKE '%lodnod_%'
OR Venue.addressLine1 LIKE '%lodnod_%'
OR Venue.addressLine2 LIKE '%lodnod_%'
OR Venue.addressLine3 LIKE '%lodnod_%'
OR Venue.city LIKE '%lodnod_%'
OR Venue.county LIKE '%lodnod_%'
OR Venue.country LIKE '%lodnod_%'

Obviously, this query is 1) going to be a huge load on the server and response times will be affected. 2) I have no idea on scoring this and therefore sorting it.

Is there a way to do this better?


回答1:


Don't use mysql for this stuff. You need an engine optimized for indexing search material. You can use elasticsearch or solr for this purpose and index it using mysql. Mysql is simply not designed for this purpose




回答2:


MySQL's documentation page for full text search functions has some useful information. Despite what the comments above say, you don't need to restrict yourself to the MyISAM table type in MySQL. As others have said, MySQL may not be the best choice, but there are some things you can do to make it more workable.

Column Aggregation

If you find that you're always doing searches like this against multiple columns, it might help to have a "searchable" column that contains just the tokens from all of the other columns. You'd need some mechanism to keep it up to date, but ON INSERT and ON UPDATE triggers would be sufficient. This may simplify your indexes and make your queries easier to read (and hopefully, speedup responses), but doesn't solve your initial issue.

One advantage of this "searchable" column idea is you can quietly add tokens of common misspellings to make things easier for your users. Your search logs are helpful here.

Consider Soundex instead

One solution that's had some recent success is using soundex instead of Levenshtein expansion. See, e.g., Rob Gravelle (2015) "MySQL Fuzzy Text Searching Using the SOUNDEX Function, Database Journal

User Defined Function (UDF)

Another option is adding a User Defined Function (UDF) to your MySQL implementation. See e.g., this Stack Overflow question or this blog post.

However you approach it, you'll find that truly fuzzy searching with Levenshtein is a slow beast. Best of luck to you!



来源:https://stackoverflow.com/questions/39145910/mysql-substring-fuzzy-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!