optimise server operations with elasticsearch : addressing low disk watermarks

问题

EDITED - Based on comments of @opster elasticsearch ninja, I edited original question to keep it focused on low disk watermarks error for ES.

For more general server optimization on small machine, see: Debugging Elasticsearch and tuning on small server, single node

For original follow up on the original question and considerations related to debugging ES failures, also: https://chat.stackoverflow.com/rooms/213776/discussion-between-opster-elasticsearch-ninja-and-user305883

Problem : I noticed that elasticsearch is failing frequently, and need to restart the server manually.

This question may relate to: High disk watermark exceeded even when there is not much data in my index

I want to have a better understanding about what elasticsearch will do if the disk size fails, how to optimise configuration and only afterwards eventually restart automatically when system fails.

Could you help in understanding how to read the elasticsearch journal and make a choice to fix the problems accordingly, suggesting best practices to tune server ops on a small server machine ?

My priority is not to have system crash; it is ok to have a bit less performance, no budget to increase server size.

Hardware

I am running elasticsearch on a single small server (2GB), have 3 index (500mb, 20mb and 65mb of store size) and several GB free on disk (state solid) : I would like to allow use virtual memory VS consuming RAM.

Below what I did:

What does the journal say?

journalctl | grep elasticsearch> explore failures related to ES.

    May 13 05:44:15 ubuntu systemd[1]: elasticsearch.service: Main process exited, code=killed, status=9/KILL
May 13 05:44:15 ubuntu systemd[1]: elasticsearch.service: Unit entered failed state.
May 13 05:44:15 ubuntu systemd[1]: elasticsearch.service: Failed with result 'signal'.

Here I can see ES was killed.

EDITED : I have found due to out of memory error from java, see below error in service elasticsearch status ; readers may also find useful to run:

java -XX:+PrintFlagsFinal -version | grep -iE 'HeapSize|PermSize|ThreadStackSize'

to check current memory assignment.

What does the ES log say?

check:

/var/log/elasticsearch


[2020-05-09T14:17:48,766][WARN ][o.e.c.r.a.DiskThresholdMonitor] [my_clustername-master] high disk watermark [90%] exceeded on [Ynm6YG-MQyevaDqT2n9OeA][awesome3-master][/var/lib/elasticsearch/nodes/0] free: 1.7gb[7.6%], shards will be relocated away from this node
[2020-05-09T14:17:48,766][INFO ][o.e.c.r.a.DiskThresholdMonitor] [my_clustername-master] rerouting shards: [high disk watermark exceeded on one or more nodes]

what does "shards will be relocated away from this node" if I only have one server and one instance working ?

service elasticsearch status

 Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2020-05-09 13:47:02 UTC; 32min ago
     Docs: http://www.elastic.co
  Process: 22691 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0/SUCCES
 Main PID: 22694 (java)
   CGroup: /system.slice/elasticsearch.service
           └─22694 /usr/bin/java -Xms512m -Xmx512m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+U

What does my configuration say ?

I am using a default configuration of `/etc/elasticsearch/elasticsearch.yml´

and don't have any options configured for watermark, like in https://stackoverflow.com/a/52006486/305883

Should I include them ? What would they do ?

Please note I have uncommented #bootstrap.memory_lock: true because I only have 2gb of ram.

Even if elasticsearch will perform poorly if memory is swapping, my priority is that it does not fail, and the sites stays up and running.

Running on a Single node machine - how to handle unassigned replicas ?

I understood that replicas cannot be assigned on the same nodes. As a consequence, does it make sense to have replicas on a single node ? If a primary index will fail, replicas will come to rescue or will they be unused anyway ?

I wonder if I should delete them and make space, or better not to.

回答1:

Please read the detailed explanation of opster's guide on what is low disk watermark and how to temp and permamnent fix it.

Explanation of your question:

Shards will be relocated away from this node" if I only have one server and one instance working?

Elasticsearch considers the available disk space before deciding whether to allocate new shards, relocate shards away or put all indices on reading mode based on a different threshold of this error, Reason being Elasticsearch indices consists of different shards which are persisted on data nodes and low disk space can cause the above issues.

In your case, as you have just one data node, all the indices on the same data node will be put into reading mode and even if you free up space it wouldn't come in writing mode until you explicitly hit the API mentioned in opster's guide.

Edit: On a single node it would be better to disable replica as Elasticsearch would not allocate replica of a shard to the same data node. So it doesn't make sense to have replicas on a single node Elasticasearch cluster and doing that will unnecessary mark your index and cluster health to yellow(missing replica).

来源：https://stackoverflow.com/questions/61698562/optimise-server-operations-with-elasticsearch-addressing-low-disk-watermarks

标签

performance

ElasticSearch

server

configuration

fail2ban