问题
I am using Azure Search and trying to perform a search against documents:
It seems as though doing this: /indexes/blah/docs?api-version=2015-02-28&search=abc\-1003
returns the same results as this: /indexes/blah/docs?api-version=2015-02-28&search=abc-1003
Shouldn't the first one return different results than the second due to the escaping backwards slash? From what I understand the backwards slash should allow for an exact search on the whole string of "abc-1003" instead of doing a "not" operator.
(more info here: https://msdn.microsoft.com/en-us/library/azure/dn798920.aspx)
The only way I can get it to work is by doing this (note the double quotes): /indexes/blah/docs?api-version=2015-02-28&search="abc-1003"
I would rather not do that because that would mean making the user enter in the quotes, which they will not know how to do.
Am I expecting something I shouldn't or is it possibly a bug with Azure Search?
回答1:
First, a dash not prefaced by a whitespace acts like a dash, not a negation operator.
As per the MSDN docs for simple query syntax
- Only needs to be escaped if it's the first character after whitespace, not if it's in the middle of a term. For example, "wi-fi" is a single term
Second, unless you are using a custom analyzer for your index, the dash will be treated by the analyzer almost like white-space and will break abc-1003
into two tokens, abc
and 1003
.
Then when you put it in quotes"abc-1003"
it will be treated as a search for the phrase abc 1003
, thus returning what you expect.
If you want to exact match on abc-1003
consider using a filter instead. It is faster and can matching things like guids or text with dashes
回答2:
Adding to Sean's answer, a custom analysis configuration with keyword tokenizer and a lowercase tokenfilter will address the issue. It appears that you are using the default standard analyzer which breaks words with special characters during lexical analysis at indexing. At query time, this lexical analysis applies to regular queries, not wildcard search queries. As a result, with your example, you have and <1003> in the search index and the wildcard search query that wasn't tokenized the same way and looks for terms that start with abc-1003 doesn't find it because neither terms in the index starts with abc-1003. Hope this makes sense. Please let me know if you have any additional questions.
Nate
回答3:
The documentation says that a hyphen "-
" is treated as a special character is must be escaped.
In reality a hyphen is treated as a split of the token and words on both sides are searched, as Sean Saleh pointed out.
After a small investigation, I found that you do not need a custom analyzer, build-in whitespace
would do.
Here is how you can use it:
{
"name": "example-index-name",
"fields": [
{
"name": "name",
"type": "Edm.String",
"analyzer": "whitespace",
...
},
],
...
}
You use this endpoint to update your index:
https://{service-name}.search.windows.net/indexes/{index-name}?api-version=2017-11-11&allowIndexDowntime=true
Do not forget to include api-key
to the request header.
You can also test this and other analyzers through the analyzer test endpoint:
{
"text": "Text to analyze",
"analyzer": "whitespace"
}
来源:https://stackoverflow.com/questions/37601956/azure-search-and-dashes