ElasticSearch get offsets of highlighted snippets

后端 未结 2 2031
天涯浪人
天涯浪人 2020-12-25 12:51

Is it possible to get character positions of each highlighted fragment? I need to match the highlighted text back to the source document and having character positions would

相关标签:
2条回答
  • 2020-12-25 13:19

    The client-side approach is actually standard practice.

    We have discussed adding the offsets, but are afraid it would lead to more confusion. The offsets provided are specific to Java's UTF-16 String encoding, which, while they could technically be used to calculate the fragments from $LANG, it's way more straightforward to parse the response text for the delimiters you specified.

    0 讨论(0)
  • 2020-12-25 13:29

    We have ended up extending the original text like this:

    some[1] text[2] we[3] index[4]

    Then we define a custom analyzer with:

    "char_filter": {
            "remove_tags": {
              "type": "pattern_replace",
              "pattern": "\\[[0-9]+\\]",
              "replacement": ""
    

    Now in the highlighted snippets we get the location tags and we know where in the text they appear. Ugly, but works!

    I gave a fuller answer here

    0 讨论(0)
提交回复
热议问题