Efficient way to retrieve all _ids in ElasticSearch

后端 未结 11 1814
轮回少年
轮回少年 2021-01-31 01:31

What is the fastest way to get all _ids of a certain index from ElasticSearch? Is it possible by using a simple query? One of my index has around 20,000 documents.

11条回答
  •  长情又很酷
    2021-01-31 01:50

    Better to use scroll and scan to get the result list so elasticsearch doesn't have to rank and sort the results.

    With the elasticsearch-dsl python lib this can be accomplished by:

    from elasticsearch import Elasticsearch
    from elasticsearch_dsl import Search
    
    es = Elasticsearch()
    s = Search(using=es, index=ES_INDEX, doc_type=DOC_TYPE)
    
    s = s.fields([])  # only get ids, otherwise `fields` takes a list of field names
    ids = [h.meta.id for h in s.scan()]
    

    Console log:

    GET http://localhost:9200/my_index/my_doc/_search?search_type=scan&scroll=5m [status:200 request:0.003s]
    GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s]
    GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s]
    GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.003s]
    GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s]
    ...
    

    Note: scroll pulls batches of results from a query and keeps the cursor open for a given amount of time (1 minute, 2 minutes, which you can update); scan disables sorting. The scan helper function returns a python generator which can be safely iterated through.

提交回复
热议问题