ElasticSearch - How to merge indexes into one index?

前端未结

关注

 2  985

My cluster has an index for each day since a few months ago, 5 shards each index (the default), and I can\'t run queries on the whole cluster because there are too many shards (

相关标签:

2条回答

孤独总比滥情好

2021-01-31 00:34

You can use the reindex api.

POST _reindex
{
 "conflicts": "proceed",
 "source": {
   "index": ["twitter", "blog"],
   "type": ["tweet", "post"]
  },
  "dest": {
  "index": "all_together"
 }
}

0 讨论(0)

孤城傲影

2021-01-31 00:47
Common problem that is visible only after few months of using ELK stack with filebeat creating indices day by day. There is a few options to fix the performance issue here.

_forcemerge

First you can use _forcemerge to limit the numer of segments inside Lucene index. Operation won't limit or merge indices but will improve the performance of Elasticsearch.
```
curl -XPOST 'localhost:9200/logstash-2017.07*/_forcemerge?max_num_segments=1'
```
This will run through the whole month indices and force merge segments. When done for every month, it should improve the Elasticsearch performance a lot. In my case CPU usage went down from 100% to 2.7%.

Unfortunately this won't solve the shards problem.

_reindex

Please read the _reindex documentation and backup your database before continue.

As tomas mentioned. If you want to limit number of shards or indices there is no other option than use _reindex to merge few indices into one. This can take a while depending on the number and size of indices you have.

Destination index

You can create the destination index beforehand and specify number of shards it should contain. This will ensure yours finial index will have the number of shards you need.
```
curl -XPUT 'localhost:9200/new-logstash-2017.07.01?pretty' -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "index" : {
            "number_of_shards" : 1 
        }
    }
}
'
```
Limiting number of shards

If you want to limit number of shards per index you can run _reindex one to one. In this case there should be no entries dropped as it will be exact copy but with smaller number of shards.
```
curl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d'
{
    "conflicts": "proceed",
    "source": {
        "index": "logstash-2017.07.01"
    },
    "dest": {
        "index": "logstash-v2-2017.07.01",
        "op_type": "create"
    }
}
'
```
After this operation you can remove old index and use new one. Unfortunately if you want to use old name you need to _reindex one more time with new name. If you decide to do that

DON'T FORGET TO SPECIFY NUMBER OF SHARDS FOR THE NEW INDEX! By default it will fall back to 5.

Merging multiple indices and limiting number of shards
```
curl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d'
{
    "conflicts": "proceed",
    "source": {
        "index": "logstash-2017.07*"
    },
    "dest": {
        "index": "logstash-2017.07",
        "op_type": "create"
    }
}
'
```
When done you should have all entries from logstash-2017.07.01 to logstash-2017.07.31 merged into logstash-2017.07. Note that the old indices must be deleted manually.

Some of the entries can be overwritten or merged, depending which conflicts and op_type option you choose.

Further steps

Create new indices with one shard

You can set up index template that will be used every time new logstash index is created.
```
curl -XPUT 'localhost:9200/_template/template_logstash?pretty' -H 'Content-Type: application/json' -d'
{
    "template" : "logstash-*",
    "settings" : {
        "number_of_shards" : 1
    }
}
'
```
This will ensure every new index created that match logstash- in name to have only one shard.

Group logs by month

If you don't stream too many logs you can set up your logstash to group logs by month.
```
// file: /etc/logstash/conf.d/30-output.conf

output {
    elasticsearch {
        hosts => ["localhost"]
        manage_template => false
        index => "%{[@metadata][beat]}-%{+YYYY.MM}"
        document_type => "%{[@metadata][type]}"
    }
}
```
Final thoughts

It's not easy to fix initial misconfiguration! Good luck with optimising your Elastic search!
0 讨论(0)
发布评论:

提交评论
- 加载中...

ElasticSearch - How to merge indexes into one index?

_forcemerge

_reindex

Destination index

Limiting number of shards

Merging multiple indices and limiting number of shards

Further steps

Create new indices with one shard

Group logs by month

Final thoughts