问题
I have an Sql Server 2016 database with full text indexes defined on 4 columns, each configured for a different language: Dutch, English, German & French. I used the wizard to setup the full-text index.
I am using CONTAINSTABLE with FORMSOF
and for each language I would expect executing a query with either the word stem or any verb form would return both results from the example table. This seems to work in English & German, somewhat in French, and not at all in Dutch.
I am using a very basic example with verb forms of 'running' in every language so I'm thinking something might not be configured correctly.
Example table
+----+-------------+--------------+-----------------+----------------+ | ID | KeyWordsNL | KeyWordsEN | KeyWordsDE | KeyWordsFR | +----+-------------+--------------+-----------------+----------------+ | 1 | ik loop | i run | ich laufe | je cours | | 2 | ik ga lopen | i am running | ich gehe laufen | je vais courir | +----+-------------+--------------+-----------------+----------------+
English queries
CONTAINSTABLE (SearchResult, KeyWordsEN, 'FORMSOF(INFLECTIONAL, "run")')
CONTAINSTABLE (SearchResult, KeyWordsEN, 'FORMSOF(INFLECTIONAL, "running")')
returns 1 & 2 for each query
German queries
CONTAINSTABLE (SearchResult, KeyWordsDE, 'FORMSOF(INFLECTIONAL, "laufe")')
CONTAINSTABLE (SearchResult, KeyWordsDE, 'FORMSOF(INFLECTIONAL, "laufen")')
returns 1 & 2 for each query
French queries
CONTAINSTABLE (SearchResult, KeyWordsFR, 'FORMSOF(INFLECTIONAL, "cours")')
CONTAINSTABLE (SearchResult, KeyWordsFR, 'FORMSOF(INFLECTIONAL, "courir")')
only returns record 1 in the first query (cours), second query return 1 & 2
Dutch queries
CONTAINSTABLE (SearchResult, KeyWordsNL, 'FORMSOF(INFLECTIONAL, "loop")')
CONTAINSTABLE (SearchResult, KeyWordsNL, 'FORMSOF(INFLECTIONAL, "lopen")')
only returns record 1 in the first query (loop), and record 2 in the second query (lopen)
Edit: Further testing ...
It is possible to test how fts parses the input query by using sys.dm_fts_parser. This makes clear there is simply no stemming happening for 'Dutch'. Tested on different machines.
Getting the language LCID:
select * from sys.fulltext_languages where name in ('Dutch','English','German','French')
select * from sys.dm_fts_parser('FORMSOF(INFLECTIONAL, "koe")', 1043, 0, 0)
select * from sys.dm_fts_parser('FORMSOF(INFLECTIONAL, "cow")', 1033, 0, 0)
Dutch query results in "koe", while the english query results in "cow's", "cowed", "cowing", "cows", "cows", "cow".
The same happens for every word I try, no extra forms of any word in Dutch, while English typically returns 5-10 word forms.
回答1:
I found that there is simply no specific stemming library for Dutch (and other languages). It is not clearly stated, but this article explains how to revert word breaker and stemming to previous versions, and it appears the word breaker and stemmer are actually using the same dll.
The following query shows that for Dutch (LCID 1043) the default neutral language word breaker/stemmer is used, which explains the bad results.
EXEC sp_help_fulltext_system_components 'wordbreaker';
To get the LCID per language:
SELECT * FROM sys.fulltext_languages;
来源:https://stackoverflow.com/questions/48299202/full-text-search-stemming-not-returning-consistent-results-in-different-language