ElasticSearch: Unassigned Shards, how to fix?

前端 未结 24 1061
悲&欢浪女
悲&欢浪女 2020-12-04 05:03

I have an ES cluster with 4 nodes:

number_of_replicas: 1
search01 - master: false, data: false
search02 - master: true, data: true
search03 - master: false,          


        
相关标签:
24条回答
  • 2020-12-04 05:39

    I've stuck today with the same issue of shards allocation. The script that W. Andrew Loe III has proposed in his answer didn't work for me, so I modified it a little and it finally worked:

    #!/usr/bin/env bash
    
    # The script performs force relocation of all unassigned shards, 
    # of all indices to a specified node (NODE variable)
    
    ES_HOST="<elasticsearch host>"
    NODE="<node name>"
    
    curl ${ES_HOST}:9200/_cat/shards > shards
    grep "UNASSIGNED" shards > unassigned_shards
    
    while read LINE; do
      IFS=" " read -r -a ARRAY <<< "$LINE"
      INDEX=${ARRAY[0]}
      SHARD=${ARRAY[1]}
    
      echo "Relocating:"
      echo "Index: ${INDEX}"
      echo "Shard: ${SHARD}"
      echo "To node: ${NODE}"
    
      curl -s -XPOST "${ES_HOST}:9200/_cluster/reroute" -d "{
        \"commands\": [
           {
             \"allocate\": {
               \"index\": \"${INDEX}\",
               \"shard\": ${SHARD},
               \"node\": \"${NODE}\",
               \"allow_primary\": true
             }
           }
         ]
      }"; echo
      echo "------------------------------"
    done <unassigned_shards
    
    rm shards
    rm unassigned_shards
    
    exit 0
    

    Now, I'm not kind of a Bash guru, but the script really worked for my case. Note, that you'll need to specify appropriate values for "ES_HOST" and "NODE" variables.

    0 讨论(0)
  • 2020-12-04 05:41

    Another possible reason for unassigned shards is that your cluster is running more than one version of the Elasticsearch binary.

    shard replication from the more recent version to the previous versions will not work

    This can be a root cause for unassigned shards.

    Elastic Documentation - Rolling Upgrade Process

    0 讨论(0)
  • 2020-12-04 05:42

    I had the same problem but the root cause was a difference in version numbers (1.4.2 on two nodes (with problems) and 1.4.4 on two nodes (ok)). The first and second answers (setting "index.routing.allocation.disable_allocation" to false and setting "cluster.routing.allocation.enable" to "all") did not work.

    However, the answer by @Wilfred Hughes (setting "cluster.routing.allocation.enable" to "all" using transient) gave me an error with the following statement:

    [NO(target node version [1.4.2] is older than source node version [1.4.4])]

    After updating the old nodes to 1.4.4 these nodes started to resnc with the other good nodes.

    0 讨论(0)
  • 2020-12-04 05:42

    I had two indices with unassigned shards that didn't seem to be self-healing. I eventually resolved this by temporarily adding an extra data-node[1]. After the indices became healthy and everything stabilized to green, I removed the extra node and the system was able to rebalance (again) and settle on a healthy state.

    It's a good idea to avoid killing multiple data nodes at once (which is how I got into this state). Likely, I had failed to preserve any copies/replicas for at least one of the shards. Luckily, Kubernetes kept the disk storage around, and reused it when I relaunched the data-node.


    ...Some time has passed...

    Well, this time just adding a node didn't seem to be working (after waiting several minutes for something to happen), so I started poking around in the REST API.

    GET /_cluster/allocation/explain
    

    This showed my new node with "decision": "YES".

    By the way, all of the pre-existing nodes had "decision": "NO" due to "the node is above the low watermark cluster setting". So this was probably a different case than the one I had addressed previously.

    Then I made the following simple POST[2] with no body, which kicked things into gear...

    POST /_cluster/reroute
    

    Other notes:

    • Very helpful: https://datadoghq.com/blog/elasticsearch-unassigned-shards

    • Something else that may work. Set cluster_concurrent_rebalance to 0, then to null -- as I demonstrate here.


    [1] Pretty easy to do in Kubernetes if you have enough headroom: just scale out the stateful set via the dashboard.

    [2] Using the Kibana "Dev Tools" interface, I didn't have to bother with SSH/exec shells.

    0 讨论(0)
  • 2020-12-04 05:43

    For me, this was resolved by running this from the dev console: "POST /_cluster/reroute?retry_failed"

    .....

    I started by looking at the index list to see which indices were red and then ran

    "get /_cat/shards?h=[INDEXNAME],shard,prirep,state,unassigned.reason"

    and saw that it had shards stuck in ALLOCATION_FAILED state, so running the retry above caused them to re-try the allocation.

    0 讨论(0)
  • 2020-12-04 05:43

    I tried several of the suggestions above and unfortunately none of them worked. We have a "Log" index in our lower environment where apps write their errors. It is a single node cluster. What solved it for me was checking the YML configuration file for the node and seeing that it still had the default setting "gateway.expected_nodes: 2". This was overriding any other settings we had. Whenever we would create an index on this node it would try to spread 3 out of 5 shards to the phantom 2nd node. These would therefore appear as unassigned and they could never be moved to the 1st and only node.

    The solution was editing the config, changing the setting "gateway.expected_nodes" to 1, so it would quit looking for its never-to-be-found brother in the cluster, and restarting the Elastic service instance. Also, I had to delete the index, and create a new one. After creating the index, the shards all showed up on the 1st and only node, and none were unassigned.

    # Set how many nodes are expected in this cluster. Once these N nodes
    # are up (and recover_after_nodes is met), begin recovery process immediately
    # (without waiting for recover_after_time to expire):
    #
    # gateway.expected_nodes: 2
    gateway.expected_nodes: 1
    
    0 讨论(0)
提交回复
热议问题