How to index a field with alphanumeric characters AND a dash for wildcard search

╄→гoц情女王★ 提交于 2019-12-11 17:57:20

问题


Given a model that looks like this:

{
    [Key]
    public string Id { get; set; }

    [IsSearchable]
    [Analyzer(AnalyzerName.AsString.Keyword)]
    public string AccountId { get; set; }
}

And sample data for the AccountId that would look like this:

1-ABC123
1-333444555
1-A4KK498

The field can have any combination of letters/digits and a dash in the middle.

I need to be able to search on this field using queries like 1-ABC*. However, none of the basic analyzers seem to support the dash except Keyword, which isn't picking up any wildcard queries, only fully matching. I've seen some other articles about custom analyzers, but I can't get enough information about how to build it to solve this issue.

I need to know if I have to build a customer analyzer for this field, and do I need a different search analyzer and index analyzer?

I'm using StandardLucene for other alphanumeric fields without dashes, and I have another field with dashes but it's all digits, and Keyword works just fine there. It seems the issue is with a mix of letters AND digits.


回答1:


Custom analyzer is indeed the way to go here. Basically you could define a custom analyzer that uses a “keyword” tokenizer with a “lowercase” token filter.

Add the custom analyzer to your Index class, and change the analyzer name in your model to match the custom analyzer name:

new Index()
{
    ...
    Analyzers = new[]
    {
        new CustomAnalyzer()
        {
            Name = "keyword_lowercase",
            Tokenizer = TokenizerName.Keyword,
            TokenFilters = new[] { TokenFilterName.Lowercase }
        }
    }
}

Model:

{
    [Key]
    public string Id { get; set; }

    [IsSearchable]
    [Analyzer("keyword_lowercase")]
    public string AccountId { get; set; }
}

In the REST API this would look something like:

{
    "fields": [{
        "name": "Id",
        "type": "Edm.String",
        "key": true
    },
    {
        "name": "AccountId",
        "type": "Edm.String",
        "searchable": true,
        "retrievable": true,
        "analyzer": "keyword_lowercase"
     }],
    "analyzers":[
        {
           "name":"keyword_lowercase",
           "@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
           "tokenizer":"keyword_v2",
           "tokenFilters":["lowercase"]
        }
     ]
}


来源:https://stackoverflow.com/questions/55346822/how-to-index-a-field-with-alphanumeric-characters-and-a-dash-for-wildcard-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!