Is using a load balancer with ElasticSearch unnecessary?

后端 未结 4 1812
说谎
说谎 2020-12-22 23:43

I have a cluster of 3 ElasticSearch nodes running on AWS EC2. These nodes are setup using OpsWorks/Chef. My intent is to design this cluster to be very resilient and elast

相关标签:
4条回答
  • 2020-12-23 00:06

    You don't need a load balancer — ES is already providing that functionality. You'd just another component, which could misbehave and which would add an unnecessary network hop.

    ES will shard your data (by default into 5 shards), which it will try to evenly distribute among your instances. In your case 2 instances should have 2 shards and 1 just one, but you might want to change the shards to 6 for an equal distribution.

    By default replication is set to "number_of_replicas":1, so one replica of each shard. Assuming you are using 6 shards, it could look something like this (R is a replicated shard):

    • node0: 1, 4, R3, R6
    • node1: 2, 6, R1, R5
    • node2: 3, 5, R2, R4

    Assuming node1 dies, the cluster would change to the following setup:

    • node0: 1, 4, 6, R3 + new replicas R5, R2
    • node2: 3, 5, 2, R4 + new replicas R1, R6

    Depending on your connection setting, you can either connect to one instance (transport client) or you could join the cluster (node client). With the node client you'll avoid double hops, since you'll always connect to the correct shard / index. With the transport client, your requests will be routed to the correct instance.

    So there's nothing to load balance for yourself, you'd just add overhead. The auto-clustering is probably ES's greatest strength.

    0 讨论(0)
  • 2020-12-23 00:08

    You're quite right to want to design for 'failover', and in AWS, here's how I recommend you do it.

    1) Limit the nodes in your cluster that can be elected master. For the rest, set node.client: true. Base your choice of how many master electable nodes you have on how many you want available for failover.

    2) Create an ELB that includes only the master electable nodes.

    3) In Route 53, create a CNAME for your cluster, with the value set to the DNS name of your ELB.

    0 讨论(0)
  • 2020-12-23 00:14

    It is a little unclear what to do when the master node fails. After digging, here are a couple of options:

    1. If you re using a client to connect to elasticsearch cluster, you can specify an array of DNS names in the connection configuration. The client will try to resolve DNS in sequence. If the first one fails, the second one will be attempted.
    2. Use round-robin DNS. Similar to the option number one, but without the client. Both of these options might cause a delay because DNS needs to fail first to be routed to the next available alias. DNS or client have no way of knowing that a particular host is down.
    3. Use some sort of load balancer, that way LB will know about misbehaving hosts ahead of time and not cause any delays.
    0 讨论(0)
  • 2020-12-23 00:26

    I believe load balancing an Elasticsearch cluster is a good idea (designing a fault tolerant system, resilient to single node failure.)

    To architect your cluster you'll need background on the two primary functions of Elasticsearch: 1. Writing and updating documents and 2. Querying Documents.

    Writing / indexing documents in elasticsearch:

    1. When a new document comes into Elasticsearch to be indexed, Elasticsearch determines the "primary shard" the document should be assigned to using the "Shard Routing Algorithm"
    2. The Lucene process associated with the shard "maps" the fields in the document;
    3. The Lucene process adds the document to the shard's Lucene "inverted index"
    4. Any "replica shard(s)" then receive the document; the replica shard "maps" the document and adds the document to the replica shard's Lucene "inverted index"

    Querying documents in Elasticsearch:

    1. By default, when a query is sent to Elasticsearch, the query hits a node -- this becomes the "query node" or the "gateway query node" for that query
    2. The node broadcasts the query to every shard in the index (primary & replica)
    3. each shard performs query on the shard's local Lucene inverted index.
    4. each shard returns the top 10 - 20 results to the "gateway query node"
    5. the "gateway query node" then performs a merge-sort on the combined results returned from the other shards,
    6. once the merge-sort is finished, the "gateway query node" and returns results to the client
      • the merge-sort is CPU and Memory resource heavy

    Architect a Load Balancer for Writes / Indexing / Updates

    Elasticsearch self manages the location of shards on nodes. The "master node" keeps and updates the "shard routing table". The "master node" provides a copy of the shard routing table to other nodes in the cluster.

    Generally, you don't want your master node doing much more than health checks for the cluster and updating routing tables, and managing shards.

    It's probably best to point the load balancer for writes to the "data nodes" (Data nodes are nodes that contain data = shards) and let the data nodes use their shard routing tables to get the writes to the correct shards.

    Architecting for Queries

    Elasticsearch has created a special node type: "client node", which contains "no data", and cannot become a "master node". The client node's function is to perform the final resource heavy merge-sort at the end of the query.

    For AWS you'd probably use a c3 or c4 instance type as a "client node"

    Best practice is to point the load balancer for queries to client nodes.

    Cheers!

    References:

    1. Elasticsearch Node Types
    2. Elasticsearch: Shard Routing Algorithm
    3. Elasticsearch: Replica Shards
    4. Elasticsearch: Cluster State i.e. the Shard Routing Table
    5. ElasticHQ - Introduction to Elasticsearch Video
    6. Elasticsearch: Shard numbers and Cluster Scaling
    0 讨论(0)
提交回复
热议问题