Elasticsearch data binary ran out of memory

前端 未结 1 1216
旧时难觅i
旧时难觅i 2021-01-16 08:10

Im trying to upload a 800GB file to elasticsearch but i keep getting a memory error that tells me the data binary is out of memory. I have 64GB of RAM on my system and 3TB o

相关标签:
1条回答
  • 2021-01-16 08:28

    800GB is a quite a lot to send in one shot, ES has to put all the content into memory in order to process it, so that's probably too big for the amount of memory you have.

    One way around this is to split your file into several and send each one after another. You can achieve it with a small shell script like the one below.

    #!/bin/sh
    
    # split the main file into files containing 10,000 lines max
    split -l 10000 -a 10 carrier.json /tmp/carrier_bulk
    
    # send each split file
    BULK_FILES=/tmp/carrier_bulk*
    for f in $BULK_FILES; do
        curl -s -XPOST http://localhost:9200/_bulk --data-binary @$f
    done
    

    UPDATE

    If you want to interpret the ES response you can do so easily by piping the response to a small python one-liner like this:

    curl -s -XPOST $ES_HOST/_bulk --data-binary @$f | python -c 'import json,sys;obj=json.load(sys.stdin);print "    <- Took %s ms with errors: %s" % (obj["took"], obj["errors"])';
    
    0 讨论(0)
提交回复
热议问题