Extract from ElasticSearch, into Kafka, continuously any new ES updates using logstash

你离开我真会死。 提交于 2019-12-25 04:40:13

问题


I have an ES cluster with multiple indices that all receive updates in random time intervals. I have a logstash instance extracting data from ES and passing it into Kafka.

What would be a good method to run this every minute and pickup any updates in ES?

Conf:

 input {
   elasticsearch {
     hosts => [ "hostname1.com:5432", "hostname2.com" ]
     index => "myindex-*"
     query => "*"
     size => 10000
     scroll => "5m"
   }
 }
 output {
   kafka {
     bootstrap-servers => "abc-kafka.com:1234"
     topic_id => "my.topic.test"
   }
 }

I would like to use the documents @timestamp in a query and save it in a temp file, then rerun query (with a schedule) and get the latest updates/insert (something like what the jdbc-input plugin of logstash supports)

Any ideas?

Thank you in advance


回答1:


Someone asked the same thing a few months ago but that issue didn't get much traffic. You can +1 it, maybe.

In the meantime, you could modify the query in your elasticsearch input to be like this:

query => '{"query":{"range":{"timestamp":{"gt": "now-1m"}}}}'

i.e. you query all documents whose timestamp field (arbitrary name, change to match yours) is within the past minute

Then you need to setup a cron that will start your logstash process every minute. Now due to the latency between the moment the cron is triggered, the moment logstash starts running and the moment the query arrives on the ES server side, just know that 1m might not be sufficient and you risk missing some docs. You need to test this and find out which is best.

According to this recent blog post, another way could be to record the last time Logstash ran in an environment variables LAST_RUN and use that variable in the query:

query => '{"query":{"range":{"timestamp":{"gt": "${LAST_RUN}"}}}}'

In this scenario, you'd create a shell script that is run by a cron and that does basically this:

  1. run logstash -f your_config_file.conf
  2. when done, set LAST_RUN=$(date +"%FT%T")


来源:https://stackoverflow.com/questions/35886921/extract-from-elasticsearch-into-kafka-continuously-any-new-es-updates-using-lo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!