Azure Search and Dashes

蓝咒 提交于 2019-12-13 15:09:18

问题


I am using Azure Search and trying to perform a search against documents:

It seems as though doing this: /indexes/blah/docs?api-version=2015-02-28&search=abc\-1003

returns the same results as this: /indexes/blah/docs?api-version=2015-02-28&search=abc-1003

Shouldn't the first one return different results than the second due to the escaping backwards slash? From what I understand the backwards slash should allow for an exact search on the whole string of "abc-1003" instead of doing a "not" operator.

(more info here: https://msdn.microsoft.com/en-us/library/azure/dn798920.aspx)

The only way I can get it to work is by doing this (note the double quotes): /indexes/blah/docs?api-version=2015-02-28&search="abc-1003"

I would rather not do that because that would mean making the user enter in the quotes, which they will not know how to do.

Am I expecting something I shouldn't or is it possibly a bug with Azure Search?


回答1:


First, a dash not prefaced by a whitespace acts like a dash, not a negation operator.

As per the MSDN docs for simple query syntax

- Only needs to be escaped if it's the first character after whitespace, not if it's in the middle of a term. For example, "wi-fi" is a single term

Second, unless you are using a custom analyzer for your index, the dash will be treated by the analyzer almost like white-space and will break abc-1003 into two tokens, abc and 1003.

Then when you put it in quotes"abc-1003" it will be treated as a search for the phrase abc 1003, thus returning what you expect.

If you want to exact match on abc-1003 consider using a filter instead. It is faster and can matching things like guids or text with dashes




回答2:


Adding to Sean's answer, a custom analysis configuration with keyword tokenizer and a lowercase tokenfilter will address the issue. It appears that you are using the default standard analyzer which breaks words with special characters during lexical analysis at indexing. At query time, this lexical analysis applies to regular queries, not wildcard search queries. As a result, with your example, you have and <1003> in the search index and the wildcard search query that wasn't tokenized the same way and looks for terms that start with abc-1003 doesn't find it because neither terms in the index starts with abc-1003. Hope this makes sense. Please let me know if you have any additional questions.

Nate




回答3:


The documentation says that a hyphen "-" is treated as a special character is must be escaped.
In reality a hyphen is treated as a split of the token and words on both sides are searched, as Sean Saleh pointed out.

After a small investigation, I found that you do not need a custom analyzer, build-in whitespace would do.
Here is how you can use it:

{
    "name": "example-index-name",
    "fields": [
        {
            "name": "name",
            "type": "Edm.String",  
            "analyzer": "whitespace",
            ...
        },
    ],
...
}

You use this endpoint to update your index:

https://{service-name}.search.windows.net/indexes/{index-name}?api-version=2017-11-11&allowIndexDowntime=true

Do not forget to include api-key to the request header.

You can also test this and other analyzers through the analyzer test endpoint:

{
  "text": "Text to analyze",
  "analyzer": "whitespace"
}


来源:https://stackoverflow.com/questions/37601956/azure-search-and-dashes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!