What is the Best way to sync data from mysql to elastic search

只谈情不闲聊 提交于 2020-06-13 05:36:10

问题


I have ES 2.2 and installed JDBC importer for Elasticsearch elasticsearch-jdbc-2.2.0.0 and have been able to insert data, but not being to update ES with a change in mysql, ie. syncing of mysql with ES. How do i do the sync? I executed the following shell script once, data got inserted properly but the scheduler dint work. It is not executing every minute to capture any changes in mysql(schemes table). Is there something wrong in my script? or any workaround available?

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
bin=${DIR}/bin
lib=${DIR}/lib
echo $lib
echo $bin

echo '{
"type" : "jdbc",
"autocommit" : true,
"schedule" : "0 0-59 0-23 ? * *",
"jdbc" : {
"driver": "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://XXX:3306/blahblah",
"user" : "abc",
"password" : "xyz",
"sql" : "select * from schemes",
"elasticsearch" : {
"cluster" : "mycluster",
"host" : "localhost",
"port" : 9300
},
"max_bulk_actions" : 20000,
"max_concurrent_bulk_requests" : 10,
"index" : "movies",
"type":"scheme"
}
}
' | java -cp "${lib}/*" -Dlog4j.configurationFile=${bin}/log4j2.xml org.xbib.tools.Runner org.xbib.tools.JDBCImporter

回答1:


I would suggest to use Logstash jdbc-plugin to sync MySQL data to Elasticsearch.

From comment as opponent asking about how to sync deleted records from MySQL to Elasticsearch

May be some other efficient ways to sync deleted records from MySQL to Elasticsearch :) But I am sharing here what I did.

Step 1: Lets take example of schema table. Add one column to maintain status of that schema. Something like status = 0 (default) and status = 1 ( for deleted). and also one column for updated_date. When any records delete then change the status=1 and and updated_date to current date.

Step 2:

We dont need to sync whole data every time. Index complete data one time then change the mysql query to fetch records from last 24 hours or whatever time interval fit in your use case.

Step 3: Change query to fetch data from last 24 hours only

 select * from schemes where (updated_date >= FROM_UNIXTIME(UNIX_TIMESTAMP(?)-86400,"%Y-%m-%d"))

Now your deleted data status will be changed to status=1 in your Elasticsearch index.

So you can query your active records like

{
    "query": {
        "filtered": {
           "filter": {
               "bool": {
                   "must": [
                      {
                          "term": {
                             "status": 1
                          }
                      }
                   ]
               }
           }
        }
    }
}


来源:https://stackoverflow.com/questions/35746052/what-is-the-best-way-to-sync-data-from-mysql-to-elastic-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!