I'm configuring JDBC river for ElasticSearch but I can't find any good config example. I've read all pages on elasticsearch-river-jdbc GitHub.
I have a SQL query and I need to fetch changes from all table columns every X seconds. How can I tell JDBC river that some row is changed and should be reindexed?
Data are fetched during ES server start, polling is happening, but changes are not fetched from DB to ES.
My configuration:
curl -XPUT 'localhost:9200/_river/itemsi/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://mydb.com:3306/dbname",
"user" : "yyy",
"password" : "xxx",
"sql" : "SELECT ii.id AS _id, ii.id AS myid, ... FROM ... LEFT JOIN .. ON...",
"poll" : "6s",
"strategy" : "simple"
},
"index" : {
"index" : "invoiceitems",
"bulk_size" : 600,
"max_bulk_requests" : 10,
"bulk_timeout" : "5s",
}
}'
Thank you.
Add
"autocommit" : true
in index settings. Then the problem will be resolved
You can use schedule parameter which enables repetitive runs of jdbc river plugin.
Example of a schedule parameter:
"schedule" : "0 0-59 0-23 ? * *"
This executes JDBC river every minute, every hour, all the days in the week/month/year.
For more details about schedule parameter read documentation, https://github.com/jprante/elasticsearch-river-jdbc
I can only give you my opinion on this, as I am currently building a solution which performs a large index of an Informix DB. So here is my current thought process which hasn't been tested or implemented -
What I plan to do is perform a one-shot index of the core database itself, from there implement triggers to fire updated and/or new records into a separate table. Obviously once I have performed the initial index, I will delete that river to stop it from rerunning the primary index, from here I will then define a river to poll the table which will contain the updated and/or new records every 15 minutes for instance and load this into ES.
The bit I haven't quite figured out yet is updating the records already within ES, as I'm not aware of any functionality within the river plugin that allows you to set the ID of the record from perhaps an ID field within the actual DB record, allowing you to retrieve and update from ES. Perhaps writing a standalone program which does exactly what the river plugin does?!
All thoughts and speculations at the moment, but as I said I am currently working on this. If I remember I'll perhaps return to here and post my final implementation if its ever allowed to get that far.
来源:https://stackoverflow.com/questions/18248067/fetching-changes-from-table-with-elasticsearch-jdbc-river