问题
I'm wondering whether major SQL engines out there (MS SQL, Oracle, MySQL) have the ability to understand that 2 words are related because they share the same root.
We know it's easy to match "networking" when searching for "network" because the latter is a substring of the former.
But do SQL engines have functions that can match "network" when searching for "networking"?
Thanks a lot.
回答1:
This functionality is called a stemmer: an algorithm that can deduce a stem from any form of the word.
This can be quite complex: for instance, Russian words шёл
and иду
are different forms of the same verb, though they have not a single common letter (ironically, this is also true for English: went
and go
).
Word breaking can also be quite a complex task for some languages that use no spaces between words.
SQL Server
allows using pluggable stemmers and word breakers for its fulltext search engine:
http://msdn.microsoft.com/en-us/library/ms142509.aspx
回答2:
I think the topic is 'Semantic Similarity'. There are several efforts trying to find optimal solutions to this problem.
回答3:
You can try using soundex, though it might not be exactly what you want. See http://www.codeproject.com/KB/database/Phonetic_Search_MSSQL.aspx.
回答4:
As Quassnoi pointed out, this can be done with stemming. PostgreSQL implements it for full-text search if you turn it on.
ALTER TEXT SEARCH CONFIGURATION blah_en ADD MAPPING FOR english_stem;
This uses the Snowball dictionary, which is based on the Porter stemmer. The Porter stemmer is probably one of the most widely used stemmers, so it will give decent results. It's important to remember, though, that stemming is not always as accurate as you might like.
来源:https://stackoverflow.com/questions/4051572/sql-word-root-matching