What is the fastest way to get all _ids of a certain index from ElasticSearch? Is it possible by using a simple query? One of my index has around 20,000 documents.
For Python users: the Python Elasticsearch client provides a convenient abstraction for the scroll API:
from elasticsearch import Elasticsearch, helpers
client = Elasticsearch()
query = {
"query": {
"match_all": {}
}
}
scan = helpers.scan(client, index=index, query=query, scroll='1m', size=100)
for doc in scan:
# do something
Url -> http://localhost:9200/<index>/<type>/_query
http method -> GET
Query -> {"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]}
For elasticsearch 5.x, you can use the "_source" field.
GET /_search
{
"_source": false,
"query" : {
"term" : { "user" : "kimchy" }
}
}
"fields"
has been deprecated.
(Error: "The field [fields] is no longer supported, please use [stored_fields] to retrieve stored fields or _source filtering if the field is not stored")
This is working!
def select_ids(self, **kwargs):
"""
:param kwargs:params from modules
:return: array of incidents
"""
index = kwargs.get('index')
if not index:
return None
# print("Params", kwargs)
query = self._build_query(**kwargs)
# print("Query", query)
# get results
results = self._db_client.search(body=query, index=index, stored_fields=[], filter_path="hits.hits._id")
print(results)
ids = [_['_id'] for _ in results['hits']['hits']]
return ids
Inspired by @Aleck-Landgraf answer, for me it worked by using directly scan function in standard elasticsearch python API:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import scan
es = Elasticsearch()
for dobj in scan(es,
query={"query": {"match_all": {}}, "fields" : []},
index="your-index-name", doc_type="your-doc-type"):
print dobj["_id"],