Logstash input jdbc is duplicating results

一个人想着一个人 提交于 2019-11-30 05:26:54

sql_last_start is now sql_last_valueplease check here the special parameter sql_last_start is now renamed to sql_last_value for better clarity as it is not only limited to datetime but may have other column type as well. so now solution may be something like this

input {
jdbc {
     type => "A"
     jdbc_driver_library => "C:\DEV\elasticsearch-1.7.1\plugins\elasticsearch-  jdbc-1.7.1.0\lib\jtds-1.3.1.jar"
     jdbc_driver_class => "Java::net.sourceforge.jtds.jdbc.Driver"
     jdbc_connection_string => "jdbc:jtds:sqlserver://dev_data_base_server:1433/dbApp1;domain=CORPDOMAIN;useNTLMv2=true"
     jdbc_user => "user"
     jdbc_password => "pass"
     schedule => "5 * * * *"
     use_column_value => true
     tracking_column => date
     statement => "SELECT id, date, content, status from test_table WHERE date >:sql_last_value"
    #clean_run true means it will reset sql_last_value to zero or initial value if datatype is date(default is also false)
     clean_run =>false
   }
jdbc{
  #for type B....
  }
}

i have tested with sql Server DB

please run for first time with clean_run=>ture to avoid datatype error while in development we may have different datatype value stored in sql_last_value variable

By default, the jdbc input will execute the configured SQL statement. In your case, your statement selects everything in test_table. You need to instruct your SQL statement to only load data from the last time the jdbc input ran by using the predefined sql_last_start parameter in your SQL query.

input {
  jdbc {
    type => "A"
    jdbc_driver_library => "C:\DEV\elasticsearch-1.7.1\plugins\elasticsearch-jdbc-1.7.1.0\lib\jtds-1.3.1.jar"
    jdbc_driver_class => "Java::net.sourceforge.jtds.jdbc.Driver"
    jdbc_connection_string => "jdbc:jtds:sqlserver://dev_data_base_server:1433/dbApp1;domain=CORPDOMAIN;useNTLMv2=true"
    jdbc_user => "user"
    jdbc_password => "pass"
    schedule => "5 * * * *"
    statement => "SELECT id, date, content, status from test_table WHERE date > :sql_last_start"
  }

jdbc {
    type => "B"
    jdbc_driver_library => "C:\DEV\elasticsearch-1.7.1\plugins\elasticsearch-jdbc-1.7.1.0\lib\jtds-1.3.1.jar"
    jdbc_driver_class => "Java::net.sourceforge.jtds.jdbc.Driver"
    jdbc_connection_string => "jdbc:jtds:sqlserver://dev_data_base_server:1433/dbApp2;domain=CORPDOMAIN;useNTLMv2=true"
    jdbc_user => "user"
    jdbc_password => "pass"
    schedule => "5 * * * *"
    statement => "SELECT id, date, content, status from test_table WHERE date > :sql_last_start"
  }
}

Also if by any coincidence the same record is loaded twice from your DB and you don't want dups to be created in your ES server, you can also specify to use the record ID as the document ID in your elasticsearch output, that way the document will be updated in ES and not duplicated.

output {

    if [type] == "A" {
        elasticsearch {
            host => "localhost"
            protocol => http
            index => "logstash-servera-%{+YYYY.MM.dd}"
            document_id => "%{id}"       <--- same id as in DB
        }    
    }
    if [type] == "B" {
        elasticsearch {
            host => "localhost"
            protocol => http
            index => "logstash-serverb-%{+YYYY.MM.dd}"
            document_id => "%{id}"       <--- same id as in DB
        }    
    }

  stdout { codec => rubydebug }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!