If one wants to count the number of documents in an index (of Elasticsearch) then there are (at least?) two possibilities:
Direct count
The two queries provide the same result but: - count consumes less resources/bandwidth because doesn't require to fetch documents, scoring and other internal optimizations. Set the search size to 0, could be very similar.
If you want count all the record in an index, you can also execute an aggregation terms on "_type" field.
The results should be the same. Before comparing the results, be sure to execute an index refresh.
If _search
must be used instead of _count
, and you're on Elasticsearch 7.0+, setting size: 0
and track_total_hits: true
will provide the same info as _count
GET my-index/_search
{
"query": { "term": { "field": { "value": "xyz" } } },
"size": 0,
"track_total_hits": true
}
{
"took" : 612,
"timed_out" : false,
"_shards" : {
"total" : 629,
"successful" : 629,
"skipped" : 524,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 29349466,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
See Elasticsearch 7.0 Breaking changes
Old question, chipping in because on ElasticSearch version > 7.0 :
_search
: returns the documents with the hit count for the search query, less than or equal to the result window size, which is typically 10,000. e.g.:
{"took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":10000,"relation":"gte"},"max_score": 0.34027478,"hits":[...]}}
_count
: returns the total number of hits for the search query irrespective of the result window size. no documents returned, e.g.:
{"count":5703899,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}
So, _search
might return the total hits as 10,000 if that is your configured result window size, while _count
would return the actual count for the same query.
Probably _count
is a bit faster since it doesn't have to execute a full query with ranking and result fetching and can simply return the size.
It would be interesting to know a bit more about how you manage to get different results though. For that I need more information like what exact queries you are sending and if any indexing is going on on the index.
But suppose that you do the following
_search
and _count
(with a match all query) should return the same total. If not, that'd be very weird.
curl http://localhost:9200/_cat/indices?v
provides you the count and other information in a tabular format
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open logstash-2019.10.09-000001 IS7HBUgRRzO7Rn1puBFUIQ 1 1 0 0 283b 283b
green open .kibana_task_manager_1 e4zZcF9wSQGFHB_lzTszrg 1 0 2 0 12.5kb 12.5kb
yellow open metricbeat-7.4.0-2019.10.09-000001 h_CWzZHcRsakxgyC36-HTg 1 1 6118 0 2.2mb 2.2mb
green open .apm-agent-configuration J6wkUr2CQAC5kF8-eX30jw 1 0 0 0 283b 283b
green open .kibana_2 W2ZETPygS8a83-Xcd6t44Q 1 0 1836 23 1.1mb 1.1mb
green open .kibana_1 IrBlKqO0Swa6_HnVRYEwkQ 1 0 8 0 208.8kb 208.8kb
yellow open filebeat-7.4.0-2019.10.09-000001 xSd2JdwVR1C9Ahz2SQV9NA 1 1 0 0 283b 283b
green open .tasks 0ZzzrOq0RguMhyIbYH_JKw 1 0 1 0 6.3kb 6.3kb