ElasticSearch get offsets of highlighted snippets

后端未结

关注

 2  2031

Is it possible to get character positions of each highlighted fragment? I need to match the highlighted text back to the source document and having character positions would

相关标签:

2条回答

轮回少年

2020-12-25 13:19

The client-side approach is actually standard practice.

We have discussed adding the offsets, but are afraid it would lead to more confusion. The offsets provided are specific to Java's UTF-16 String encoding, which, while they could technically be used to calculate the fragments from $LANG, it's way more straightforward to parse the response text for the delimiters you specified.

0 讨论(0)
发布评论:

提交评论
- 加载中...
我寻月下人不归

2020-12-25 13:29
We have ended up extending the original text like this:

some[1] text[2] we[3] index[4]

Then we define a custom analyzer with:
```
"char_filter": {
        "remove_tags": {
          "type": "pattern_replace",
          "pattern": "\\[[0-9]+\\]",
          "replacement": ""
```
Now in the highlighted snippets we get the location tags and we know where in the text they appear. Ugly, but works!

I gave a fuller answer here
0 讨论(0)
发布评论:

提交评论
- 加载中...