I read notes about Lucene deing limited to 2Gb documents. Are there any additional limitations on the size of documents that can be indexed in Elasticsearch?
Think things have changed slightly over the years with Elasticsearch. In the 7.x documentation referenced here - General Recommendations:
Given that the default http.max_content_length is set to 100MB, Elasticsearch will refuse to index any document that is larger than that. You might decide to increase that particular setting, but Lucene still has a limit of about 2GB.
So it would seem that ES has a limit of ~100MB and Lucene's is 2GB as the other answer stated.
Lucene uses a byte buffer internally that uses 32bit integers for addressing. By definition this limits the size of the documents. So 2GB is max in theory.
In ElasticSearch:
There is a max http request size
in the ES GitHub code, and it is set against Integer.MAX_VALUE
or 2^31-1
. So, basically, 2GB is the maximum document size for bulk indexing over HTTP. And also to add to it, ES does not process an HTTP request until it completes.
Good Practices:
For further study refer to these links:
Performance considerations for elasticsearch indexing
Document maximum size for bulk indexing over HTTP