How to deduplicate documents while indexing into elasticsearch from logstash

后端 未结 2 1855
别跟我提以往
别跟我提以往 2020-12-31 16:52

I\'m using Logstash 1.4.1 together with ES1.01 and would like to replace already indexed documents based on a calculated checksum. I\'m currently using the \"fingerprint\" f

相关标签:
2条回答
  • 2020-12-31 17:16

    Assuming the fingerprint is getting set as the _id, you may be hitting an issue with logstash's daily index management and not using the timestamp from your data.

    Ensure that you have your timestamp set from the input data, so you are guaranteed the document goes to the correct daily index:

    http://logstash.net/docs/1.4.2/filters/date

    If my guess is correct, you should see that your duplicate documents have different @timestamp and are in different daily indexes.

    0 讨论(0)
  • 2020-12-31 17:32

    I would use the document_id parameter in your logstash elasticsearch output section:

    document_id

    Value type is string
    Default value is nil
    

    The document ID for the index. Useful for overwriting existing entries in Elasticsearch with the same ID.

    https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-document_id

    I believe the entry should be something like this:

    document_id => "%{fingerprint}"
    

    It uses logstash's sprintf format to replace a string with the contents of a field:

    https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#sprintf

    0 讨论(0)
提交回复
热议问题