Indexing a comma-separated value field in Elastic Search

前端 未结 1 1830
暗喜
暗喜 2020-12-30 10:40

I\'m using Nutch to crawl a site and index it into Elastic search. My site has meta-tags, some of them containing comma-separated list of IDs (that I intend to use for searc

相关标签:
1条回答
  • 2020-12-30 11:28

    Create custom analyzer which will split indexed text into tokens by commas.

    Then you can try to search. In case you don't care about relevance you can use filter to search through your documents. My example shows how you can attempt search with term filter.

    Below you can find how to do this with sense plugin.

    DELETE testindex
    
    PUT testindex
    {
        "index" : {
            "analysis" : {
                "tokenizer" : {
                    "comma" : {
                        "type" : "pattern",
                        "pattern" : ","
                    }
                },
                "analyzer" : {
                    "comma" : {
                        "type" : "custom",
                        "tokenizer" : "comma"
                    }
                }
            }
        }
    }
    
    PUT /testindex/_mapping/yourtype
    {
            "properties" : {
                "contentType" : {
                    "type" : "string",
                    "analyzer" : "comma"
                }
            }
    }
    
    PUT /testindex/yourtype/1
    {
        "contentType" : "1,2,3"
    }
    
    PUT /testindex/yourtype/2
    {
        "contentType" : "3,4"
    }
    
    PUT /testindex/yourtype/3
    {
        "contentType" : "1,6"
    }
    
    GET /testindex/_search
    {
        "query": {"match_all": {}}
    }
    
    GET /testindex/_search
    {
        "filter": {
            "term": {
               "contentType": "6"
            }
        }
    }
    

    Hope it helps.

    0 讨论(0)
提交回复
热议问题