问题
We experience unexpectedly high recall for Chinese queries. I have managed to reproduce a minimal use-case using a simple data model with only 2 properties.
REPRODUCE
Define a property DescriptionZhCn for Chinese product descriptions, using zh-Hans.microsoft analyzer
Populate two records with the following values in DescriptionZhCn
Contoso 减振接杆
Contoso 缩径接柄
Search using options searchMode=all, queryType=full, searchFields=DescriptionZhCn, api-version=2019-05-06 with the following values in the search parameter:
减振接杆
缩径接柄
EXPECTED
When searching for 减振接杆 I would expect only the record with description "Contoso 减振接杆". When searching for 缩径接柄 I would expect only the record "Contoso 缩径接柄".
ACTUAL
Searching either 减振接杆 or 缩径接柄 unexpectedly return both records. The only thing common character is the third character 接.
I have verified the output from the zh-Hans.microsoft analyzer and it splits both of the Chinese strings into 4 tokens. E.g.
减振接杆 => 减 振 接 杆
My query only matches one of the tokens. And I'm using searchMode=all. Why does my query match? Is this a bug? Any input Yanoosh, Liam?
来源:https://stackoverflow.com/questions/64485275/chinese-queries-result-in-unexpectly-high-recall