Efficient way to retrieve all _ids in ElasticSearch

后端 未结 11 1796
轮回少年
轮回少年 2021-01-31 01:31

What is the fastest way to get all _ids of a certain index from ElasticSearch? Is it possible by using a simple query? One of my index has around 20,000 documents.

相关标签:
11条回答
  • 2021-01-31 02:02

    For Python users: the Python Elasticsearch client provides a convenient abstraction for the scroll API:

    from elasticsearch import Elasticsearch, helpers
    client = Elasticsearch()
    
    query = {
        "query": {
            "match_all": {}
        }
    }
    
    scan = helpers.scan(client, index=index, query=query, scroll='1m', size=100)
    
    for doc in scan:
        # do something
    
    0 讨论(0)
  • 2021-01-31 02:04
    Url -> http://localhost:9200/<index>/<type>/_query
    http method -> GET
    Query -> {"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]}
    
    0 讨论(0)
  • 2021-01-31 02:09

    For elasticsearch 5.x, you can use the "_source" field.

    GET /_search
    {
        "_source": false,
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }
    

    "fields" has been deprecated. (Error: "The field [fields] is no longer supported, please use [stored_fields] to retrieve stored fields or _source filtering if the field is not stored")

    0 讨论(0)
  • 2021-01-31 02:10

    This is working!

    def select_ids(self, **kwargs):
        """
    
        :param kwargs:params from modules
        :return: array of incidents
        """
        index = kwargs.get('index')
        if not index:
            return None
    
        # print("Params", kwargs)
        query = self._build_query(**kwargs)
        # print("Query", query)
    
        # get results
        results = self._db_client.search(body=query, index=index, stored_fields=[], filter_path="hits.hits._id")
        print(results)
        ids = [_['_id'] for _ in results['hits']['hits']]
        return ids
    
    0 讨论(0)
  • 2021-01-31 02:11

    Inspired by @Aleck-Landgraf answer, for me it worked by using directly scan function in standard elasticsearch python API:

    from elasticsearch import Elasticsearch
    from elasticsearch.helpers import scan
    es = Elasticsearch()
    for dobj in scan(es, 
                     query={"query": {"match_all": {}}, "fields" : []},  
                     index="your-index-name", doc_type="your-doc-type"): 
            print dobj["_id"],
    
    0 讨论(0)
提交回复
热议问题