Fetching changes from table with ElasticSearch JDBC river

浪尽此生 提交于 2019-11-29 21:12:29

问题


I'm configuring JDBC river for ElasticSearch but I can't find any good config example. I've read all pages on elasticsearch-river-jdbc GitHub.

I have a SQL query and I need to fetch changes from all table columns every X seconds. How can I tell JDBC river that some row is changed and should be reindexed?

Data are fetched during ES server start, polling is happening, but changes are not fetched from DB to ES.

My configuration:

curl -XPUT 'localhost:9200/_river/itemsi/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
    "driver" : "com.mysql.jdbc.Driver",
    "url" : "jdbc:mysql://mydb.com:3306/dbname",
    "user" : "yyy",
    "password" : "xxx",
    "sql" : "SELECT ii.id AS _id, ii.id AS myid, ... FROM ... LEFT JOIN .. ON...",
    "poll" : "6s",
    "strategy" : "simple"
    },
"index" : {
    "index" : "invoiceitems",
    "bulk_size" : 600,
    "max_bulk_requests" : 10,
    "bulk_timeout" : "5s",
    }
}'

Thank you.


回答1:


Add

"autocommit" : true

in index settings. Then the problem will be resolved




回答2:


You can use schedule parameter which enables repetitive runs of jdbc river plugin.

Example of a schedule parameter:

"schedule" : "0 0-59 0-23 ? * *"

This executes JDBC river every minute, every hour, all the days in the week/month/year.

For more details about schedule parameter read documentation, https://github.com/jprante/elasticsearch-river-jdbc




回答3:


I can only give you my opinion on this, as I am currently building a solution which performs a large index of an Informix DB. So here is my current thought process which hasn't been tested or implemented -

What I plan to do is perform a one-shot index of the core database itself, from there implement triggers to fire updated and/or new records into a separate table. Obviously once I have performed the initial index, I will delete that river to stop it from rerunning the primary index, from here I will then define a river to poll the table which will contain the updated and/or new records every 15 minutes for instance and load this into ES.

The bit I haven't quite figured out yet is updating the records already within ES, as I'm not aware of any functionality within the river plugin that allows you to set the ID of the record from perhaps an ID field within the actual DB record, allowing you to retrieve and update from ES. Perhaps writing a standalone program which does exactly what the river plugin does?!

All thoughts and speculations at the moment, but as I said I am currently working on this. If I remember I'll perhaps return to here and post my final implementation if its ever allowed to get that far.



来源:https://stackoverflow.com/questions/18248067/fetching-changes-from-table-with-elasticsearch-jdbc-river

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!