I can\'t see any description of when I should use a query or a filter or some combination of the two. What is the difference between them? Can anyone please explain?
Since version 2 of Elasticsearch, filters and queries have been merged and any query clause can be used as either a filter or a query (depending on the context). As with version 1, filters are cached and should be used if scoring does not matter.
Source: https://logz.io/blog/elasticsearch-queries/
Say index myindex
contains three documents:
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hello world!" }'
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hello world! I am Sam." }'
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hi Stack Overflow!" }'
Query: How well a document matches the query
hello sam
(using keyword must
)curl localhost:9200/myindex/_search?pretty -d '
{
"query": { "bool": { "must": { "match": { "msg": "hello sam" }}}}
}'
Document "Hello world! I am Sam."
is assigned a higher score than "Hello world!"
, because the former matches both words in the query. Documents are scored.
"hits" : [
...
"_score" : 0.74487394,
"_source" : {
"name" : "Hello world! I am Sam."
}
...
"_score" : 0.22108285,
"_source" : {
"name" : "Hello world!"
}
...
Filter: Whether a document matches the query
hello sam
(using keyword filter
)curl localhost:9200/myindex/_search?pretty -d '
{
"query": { "bool": { "filter": { "match": { "msg": "hello sam" }}}}
}'
Documents that contain either hello
or sam
are returned. Documents are NOT scored.
"hits" : [
...
"_score" : 0.0,
"_source" : {
"name" : "Hello world!"
}
...
"_score" : 0.0,
"_source" : {
"name" : "Hello world! I am Sam."
}
...
The difference is simple: filters are cached and don't influence the score, therefore faster than queries. Have a look here too. Let's say a query is usually something that the users type and pretty much unpredictable, while filters help users narrowing down the search results , for example using facets.
Filters
-> Does this document match? a binary yes or no answer
Queries
-> Does this document match? How well does it match? uses scoring
This is what official documentation says:
As a general rule, filters should be used instead of queries:
- for binary yes/no searches
- for queries on exact values
As a general rule, queries should be used instead of filters:
- for full text search
- where the result depends on a relevance score
Basically, a query is used when you want to perform a search on your documents with scoring. And filters are used to narrow down the set of results obtained by using query. Filters are boolean.
For example say you have an index of restaurants something like zomato. Now you want to search for restaurants that serve 'pizza', which is basically your search keyword.
So you will use query to find all the documents containing "pizza" and some results will obtained.
Say now you want list of restaurant that serves pizza and has rating of atleast 4.0.
So what you will have to do is use the keyword "pizza" in your query and apply the filter for rating as 4.0.
What happens is that filters are usually applied on the results obtained by querying your index.