Im trying to upload a 800GB file to elasticsearch but i keep getting a memory error that tells me the data binary is out of memory. I have 64GB of RAM on my system and 3TB o
800GB is a quite a lot to send in one shot, ES has to put all the content into memory in order to process it, so that's probably too big for the amount of memory you have.
One way around this is to split your file into several and send each one after another. You can achieve it with a small shell script like the one below.
#!/bin/sh
# split the main file into files containing 10,000 lines max
split -l 10000 -a 10 carrier.json /tmp/carrier_bulk
# send each split file
BULK_FILES=/tmp/carrier_bulk*
for f in $BULK_FILES; do
curl -s -XPOST http://localhost:9200/_bulk --data-binary @$f
done
UPDATE
If you want to interpret the ES response you can do so easily by piping the response to a small python one-liner like this:
curl -s -XPOST $ES_HOST/_bulk --data-binary @$f | python -c 'import json,sys;obj=json.load(sys.stdin);print " <- Took %s ms with errors: %s" % (obj["took"], obj["errors"])';