问题
I'm sending elasticsearch using the logstash of the data contained in the mysql database.
but each time logstash runs, the number of documents remains the same, but the index size increases.
first run count: 333 | size in bytes : 206kb
now count:333 | size in bytes : 1.6MB
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://***rds.amazonaws.com:3306/"
jdbc_user => "***"
jdbc_password => "***"
jdbc_driver_library => "***\mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT id,title,url, FROM tableName"
schedule => "*/2 * * * *"
}
}
filter {
json {
source => "texts"
target => "texts"
}
mutate { remove_field => [ "@version", "@timestamp" ] }
}
output {
stdout {
codec => json_lines
}
amazon_es {
hosts => ["***es.amazonaws.com"]
document_id => "%{id}"
index => "texts"
region => "***"
aws_access_key_id => '***'
aws_secret_access_key => '***'
}
}
回答1:
Apparently you're always sending the same data over and over. In ES, each time you update a document (i.e. by using the same ID), the older version gets deleted and stays in the index for a while (until the underlying index segments get merged).
Between each run, you can issue the following command:
curl -XGET ***es.amazonaws.com/_cat/indices?v
In the response you get, check the docs.deleted
column and you'll see that the number of deleted documents increases.
来源:https://stackoverflow.com/questions/57883302/document-count-is-same-but-index-size-is-growing-every-logstash-run