How to match on prefix in Elasticsearch

前端 未结 5 566
太阳男子
太阳男子 2020-12-30 02:49

let\'s say that in my elasticsearch index I have a field called \"dots\" which will contain a string of punctuation separated words (e.g. \"first.second.third\").

I

相关标签:
5条回答
  • 2020-12-30 03:04

    Elasticsearch has Path Hierarchy Tokenizer that was created exactly for such use case. Here is an example of how to set it for your index:

    # Create a new index with custom path_hierarchy analyzer 
    # See http://www.elasticsearch.org/guide/reference/index-modules/analysis/pathhierarchy-tokenizer.html
    curl -XPUT "localhost:9200/prefix-test" -d '{
        "settings": {
            "analysis": {
                "analyzer": {
                    "prefix-test-analyzer": {
                        "type": "custom",
                        "tokenizer": "prefix-test-tokenizer"
                    }
                },
                "tokenizer": {
                    "prefix-test-tokenizer": {
                        "type": "path_hierarchy",
                        "delimiter": "."
                    }
                }
            }
        },
        "mappings": {
            "doc": {
                "properties": {
                    "dots": {
                        "type": "string",
                        "analyzer": "prefix-test-analyzer",
                        //"index_analyzer": "prefix-test-analyzer", //deprecated
                        "search_analyzer": "keyword"
                    }
                }
            }
        }
    }'
    echo
    # Put some test data
    curl -XPUT "localhost:9200/prefix-test/doc/1" -d '{"dots": "first.second.third"}'
    curl -XPUT "localhost:9200/prefix-test/doc/2" -d '{"dots": "first.second.foo-bar"}'
    curl -XPUT "localhost:9200/prefix-test/doc/3" -d '{"dots": "first.baz.something"}'
    curl -XPOST "localhost:9200/prefix-test/_refresh"
    echo
    # Test searches. 
    curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
        "query": {
            "term": {
                "dots": "first"
            }
        }
    }'
    echo
    curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
        "query": {
            "term": {
                "dots": "first.second"
            }
        }
    }'
    echo
    curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
        "query": {
            "term": {
                "dots": "first.second.foo-bar"
            }
        }
    }'
    echo
    curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true&q=dots:first.second"
    echo
    
    0 讨论(0)
  • 2020-12-30 03:09

    Have a look at prefix queries.

    $ curl -XGET 'http://localhost:9200/index/type/_search' -d '{
        "query" : {
            "prefix" : { "dots" : "first.second" }
        }
    }'
    
    0 讨论(0)
  • 2020-12-30 03:09

    You should use a commodin chars to make your query, something like this:

    $ curl -XGET http://localhost:9200/myapp/index -d '{
        "dots": "first.second*"
    }'
    

    more examples about the syntax at: http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html

    0 讨论(0)
  • 2020-12-30 03:11

    There is also a much easier way, as pointed out in elasticsearch documentation:

    just use:

    {
        "text_phrase_prefix" : {
            "fieldname" : "yourprefix"
        }
    }
    

    or since 0.19.9:

    {
        "match_phrase_prefix" : {
            "fieldname" : "yourprefix"
        }
    }
    

    instead of:

    {   
        "prefix" : { 
            "fieldname" : "yourprefix" 
    }
    
    0 讨论(0)
  • 2020-12-30 03:17

    I was looking for a similar solution - but matching only a prefix. I found @imtov's answer to get me almost there, but for one change - switching the analyzers around:

    "mappings": {
        "doc": {
            "properties": {
                "dots": {
                    "type": "string",
                    "analyzer": "keyword",
                    "search_analyzer": "prefix-test-analyzer"
                }
            }
        }
    }
    

    instead of

    "mappings": {
        "doc": {
            "properties": {
                "dots": {
                    "type": "string",
                    "index_analyzer": "prefix-test-analyzer",
                    "search_analyzer": "keyword"
                }
            }
        }
    }
    

    This way adding:

    '{"dots": "first.second"}'
    '{"dots": "first.third"}'
    

    Will add only these full tokens, without storing first, second, third tokens.

    Yet searching for either

    first.second.anyotherstring
    first.second
    

    will correctly return only the first entry:

    '{"dots": "first.second"}'
    

    Not exactly what you asked for but somehow related, so I thought could help someone.

    0 讨论(0)
提交回复
热议问题