问题
I'm considering a daily script to do the following, in order to account for any situations where there was a problem with updates on the ES server (I don't yet have a high-availability setup and even so, it's still probably a good practice in a situation where data is being duplicated between DB and ES). Before putting this script together, I thought I'd check if I'm going about this the right way, and whether there are any libraries or techniques I should use.
The script will simply retrieve all IDs from the database and all IDs from ElasticSearch, where created_at < current_time
(a snapshot of the current time, since it's a moving target as the script runs). It will then add and remove to Elastic search based on the differences between these IDs sets.
Does this sound like a reasonable approach?
回答1:
To answer my question, this is not the best approach.
A simpler, if more resource-intensive, approach is to re-build the entire index periodically. Of course, this is difficult to do in production as it would cause minutes or hours of downtime, so the trick is to rebuild a new index and switch to using that. In ElasticSearch, you can't rename an index, but you can use aliases.
There's a discussion of the approach here and a rake task for Tire users here.
回答2:
Please have a look at jdbc-river plugin. This plugin is fairly stable and can be used to sync data between ES and database.
来源:https://stackoverflow.com/questions/11952558/ensuring-elasticsearch-is-in-sync-with-database