Solr safe dataimport and core swap on high-traffic website

本秂侑毒 提交于 2019-12-03 12:21:31

Good (and hard) question!

The full-import is a very heavy operation, in general it's better to run delta queries to only update your index to the latest changes in your RDMS. I got why you swap the master when you need to do a full-import: you keep up-to-date the live core using delta-import while the full-import is running on the new core, since it takes two hours. Sounds good, as long as the full-import is not used that frequently.

Regarding the replication, I would make sure that there isn't any replication in progress before swapping the master core. For more details about how replication works you can have a look at the Solr wiki if you haven't done it yet.

Furthermore, I would make sure that there isn't any delta-import running on the live core before swapping the master core.

We have a slightly modified situation at our end. There are two DataImportHandlers - one for full import, other for delta import. The delta import is triggered by a cron every 3hrs and takes minutes to complete. The full import of about 10m documents take ~48hrs (Insane!). A large part of this involves network latency, since a huge amount of data is fetched from a MySQL Table for every document. These two tables reside on two different MySQL Servers and can not be joined.

We have a 'live' core, which is the one having delta imports. We introduce another 'rebuild' core and perform a full index which takes ~48hrs to finish. By this time, we keep a track of all the documents which have been updated/deleted from 'live' core, and then do a delta import in 'rebuild' core, to get both of them to same state. On a normal day, once both the cores are at the same state, we would swap them and serve from rebuild core. (Who will monitor that the rebuild core is done full indexing and has applied delta patches as well?)

Sometimes, we would want to have both the 'live' and 'rebuild' core serving at the same time for 'ab testing'. In those times, both the 'live' and 'rebuild' core would have delta imports for consistency, and both would be serving. Based on the outcome, we would like to keep one and remove the other by swapping.

In order to make this whole setup operationally stable, we plan to introduce a monitor process which would check if the 'rebuild' core is indexing or done with that. If it has indexed, the monitor process would update it with the delta documents, and activate the delta indexing cron for both the cores. Upon the completion of ab phase, one of the core would be unloaded and the other core swapped. The extra crons would then be disabled.

There are a few more moving parts in this design and the reliability of monitor process is critical to the smooth operation. Any Suggestions/ alternatives?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!