Retrieve analyzed tokens from ElasticSearch documents

谁都会走 提交于 2019-12-09 07:30:33

问题


Trying to access the analyzed/tokenized text in my ElasticSearch documents.

I know you can use the Analyze API to analyze arbitrary text according your analysis modules. So I could copy and paste data from my documents into the Analyze API to see how it was tokenized.

This seems unnecessarily time consuming, though. Is there any way to instruct ElasticSearch to returned the tokenized text in search results? I've looked through the docs and haven't found anything.


回答1:


Have a look at this other answer: elasticsearch - Return the tokens of a field. Unfortunately it requires to reanalyze on the fly the content of your field using the script provided.
It should be possible to write a plugin to expose this feature. The idea would be to add two endpoints to:

  • allow to read the lucene TermsEnum like the solr TermsComponent does, useful to make auto-suggestions too. Note that it wouldn't be per document, just every term on the index with term frequency and document frequency (potentially expensive with a lot of unique terms)
  • allow to read the term vectors if enabled, like the solr TermVectorComponent does. This would be per document but requires to store the term vectors (you can configure it in your mapping) and allows also to retrieve positions and offsets if enabled.



回答2:


This question is a litte old, but maybe I think an additional answer is necessary.

With ElasticSearch 1.0.0 the Term Vector API was added which gives you direct access to the tokens ElasticSearch stores under the hood on per document basis. The API docs are not very clear on this (only mentioned in the example), but in order to use the API you have to first indicate in your mapping definition that you want to store term vectors with the term_vector property on each field.




回答3:


You may want to use scripting, however your server should have the scripting enabled.

curl 'http://localhost:9200/your_index/your_type/_search?pretty=true' -d '{
    "query" : {
        "match_all" : { }
    },
    "script_fields": {
        "terms" : {
            "script": "doc[field].values",
            "params": {
                "field": "field_x.field_y"
            }
        }
    }
}'

The default setting for allowing the script depends on the elastic search version, so please check that out from the official documentation.



来源:https://stackoverflow.com/questions/13404722/retrieve-analyzed-tokens-from-elasticsearch-documents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!