I really don\'t understand why in core types link it says in the attributes descriptions (for a number, for example):
By default in elasticsearch, the _source
(the document one indexed) is stored. This means when you search, you can get the actual document source back. Moreover, elasticsearch will automatically extract fields / objects
from the _source
and return them if you explicitly ask for it (as well as possibly use it in other components, like highlighting).
You can specify that a specific field is also stored. This means that the data for that field will be stored on its own. Meaning that if you ask for field1
(which is stored), elasticsearch will identify that its stored, and load it from the index instead of getting it from the _source
(assuming _source is enabled).
When do you want to enable storing specific fields? Most times, you don't. Fetching the _source is fast and extracting it is fast as well. If you have very large documents, where the cost of storing the _source
, or the cost of parsing the _source
is high, you can explicitly map some fields to be stored instead.
Note, there is a cost of retrieving each stored field. So, for example, if you have a json with 10 fields with reasonable size, and you map all of them as stored, and ask for all of them, this means loading each one (more disk seeks), compared to just loading the _source
(which is one field, possibly compressed).
Source link
I thought that if "store" is set to "no" it would mean I could not retrieve the specific field, but had to get the whole _source and parse it on the client side.
That's exactly what elasticsearch does for you when a field is not stored (default) and the _source
field is enabled (default too).
You usually send a field to elasticsearch because you either want to search on it, or retrieve it. But it's true that if you don't store the field explicitly and you don't disable the source you can still retrieve the field using the _source
. This means that in some cases it might actually make sense to have a field that is not indexed nor stored.
When you store a field, that's done in the underlying lucene. Lucene is an inverted index, that allows for fast full-text search and gives back document ids given text queries. Beyond the inverted index Lucene has some kind of storage where the field values can be stored in order to be retrieved given a document id. You usually store in lucene the fields that you want to return as search results. Elasticsearch doesn't require to store every field that you want to return because it always stores by default every document that you send to it, thus it's always able to return everything you sent to it as search result.
In just a few cases it might be useful to store fields explicitly in lucene: when the _source
field is disabled, or when we want to avoid parsing it, even if the parsing is done automatically by elasticsearch.
Keep in mind though that retrieving many stored fields from lucene might require one disk seek per field while with retrieving only the _source
from lucene and parsing it in order to retrieve the needed fields is just a single disk seek and just faster in most of the cases.