Document count is same but index size is growing every logstash run

回眸只為那壹抹淺笑 提交于 2021-01-29 13:36:37

问题


I'm sending elasticsearch using the logstash of the data contained in the mysql database.

but each time logstash runs, the number of documents remains the same, but the index size increases.

first run count: 333 | size in bytes : 206kb

now count:333 | size in bytes : 1.6MB

input {
    jdbc {
        jdbc_connection_string => "jdbc:mysql://***rds.amazonaws.com:3306/"
        jdbc_user => "***"
        jdbc_password => "***"
        jdbc_driver_library => "***\mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        statement => "SELECT id,title,url, FROM tableName"
        schedule => "*/2 * * * *"
    }
}
filter {
  json {
    source => "texts"
    target => "texts"
  }
  mutate { remove_field => [ "@version", "@timestamp" ] }
}
output {
    stdout {
            codec => json_lines
    }
    amazon_es {
      hosts => ["***es.amazonaws.com"]
      document_id => "%{id}"
      index => "texts"
      region => "***"
      aws_access_key_id => '***'
      aws_secret_access_key => '***'
  }  
}

回答1:


Apparently you're always sending the same data over and over. In ES, each time you update a document (i.e. by using the same ID), the older version gets deleted and stays in the index for a while (until the underlying index segments get merged).

Between each run, you can issue the following command:

curl -XGET ***es.amazonaws.com/_cat/indices?v

In the response you get, check the docs.deleted column and you'll see that the number of deleted documents increases.



来源:https://stackoverflow.com/questions/57883302/document-count-is-same-but-index-size-is-growing-every-logstash-run

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!