问题
I have ES 2.2 and installed JDBC importer for Elasticsearch elasticsearch-jdbc-2.2.0.0
and have been able to insert data, but not being to update ES with a change in mysql, ie. syncing of mysql with ES. How do i do the sync? I executed the following shell script once, data got inserted properly but the scheduler dint work. It is not executing every minute to capture any changes in mysql(schemes table). Is there something wrong in my script? or any workaround available?
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
bin=${DIR}/bin
lib=${DIR}/lib
echo $lib
echo $bin
echo '{
"type" : "jdbc",
"autocommit" : true,
"schedule" : "0 0-59 0-23 ? * *",
"jdbc" : {
"driver": "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://XXX:3306/blahblah",
"user" : "abc",
"password" : "xyz",
"sql" : "select * from schemes",
"elasticsearch" : {
"cluster" : "mycluster",
"host" : "localhost",
"port" : 9300
},
"max_bulk_actions" : 20000,
"max_concurrent_bulk_requests" : 10,
"index" : "movies",
"type":"scheme"
}
}
' | java -cp "${lib}/*" -Dlog4j.configurationFile=${bin}/log4j2.xml org.xbib.tools.Runner org.xbib.tools.JDBCImporter
回答1:
I would suggest to use Logstash jdbc-plugin to sync MySQL data to Elasticsearch.
From comment as opponent asking about how to sync deleted records from MySQL to Elasticsearch
May be some other efficient ways to sync deleted records from MySQL to Elasticsearch :) But I am sharing here what I did.
Step 1:
Lets take example of schema table. Add one column to maintain status of that schema. Something like status = 0
(default) and status = 1
( for deleted). and also one column for updated_date. When any records delete then change the status=1 and and updated_date to current date.
Step 2:
We dont need to sync whole data every time. Index complete data one time then change the mysql query to fetch records from last 24 hours or whatever time interval fit in your use case.
Step 3: Change query to fetch data from last 24 hours only
select * from schemes where (updated_date >= FROM_UNIXTIME(UNIX_TIMESTAMP(?)-86400,"%Y-%m-%d"))
Now your deleted data status will be changed to status=1 in your Elasticsearch index.
So you can query your active records like
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"status": 1
}
}
]
}
}
}
}
}
来源:https://stackoverflow.com/questions/35746052/what-is-the-best-way-to-sync-data-from-mysql-to-elastic-search