Endless recovering state of secondary

后端 未结 3 1088
醉梦人生
醉梦人生 2021-02-08 17:47

I build a replication set with one primary, one secondary and one arbiter on MongoDB 3.0.2. The primary and arbiter are on the same host and the secondary is on another host.

相关标签:
3条回答
  • 2021-02-08 18:16

    The problem (most likely)

    The last operation on the primary is from "2015-05-15T02:10:56Z", whereas the last operation of the going to be secondary is from "2015-05-14T11:23:51Z", which is a difference of roughly 15 hours. That window may well exceed your replication oplog window (the difference between the time of the first and the last operation entry in your oplog). Put simply, there are too many operations on the primary for the secondary to catch up.

    A bit more elaborated (though simplified): during an initial sync, the data the secondary syncs from is the data of a given point in time. When the data of that point in time is synced over, the secondary connects to the oplog and applies the changes that were made between said point in time and now according to the oplog entries. This works well as long as the oplog holds all operations between the mentioned point in time. But the oplog has a limited size (it is a so called capped collection). So if there are more operations happening on the primary than the oplog can hold during the initial sync, the oldest operations "fade out". The secondary recognises that not all operations are available necessary to "construct" the same data as the primary and refuses to complete the sync, staying in RECOVERY mode.

    The solution(s)

    The problem is a known one and not a bug, but a result of the inner workings of MongoDB and several fail-safe assumptions made by the development team. Hence, there are several ways to deal with the situation. Sadly, since you only have two data bearing nodes, all involve downtime.

    Option 1: Increase the oplog size

    This is my preferred method, since it deals with the problem once and (kind of) for all. It's a bit more complicated than other solutions, though. From a high level perspective, these are the steps you take.

    1. Shut down the primary
    2. Create a backup of the oplog using direct access to the data files
    3. Restart the mongod in standalone mode
    4. Copy the current oplog to a temporary collection
    5. Delete the current oplog
    6. Recreate the oplog with the desired size
    7. Copy back the oplog entries from the temporary collection to the shiny new oplog
    8. Restart mongod as part of the replica set

    Do not forget to increase the oplog of the secondary before doing the initial sync, since it may become primary at some time in the future!

    For details, please read "Change the size of the oplog" in the tutorials regarding replica set maintenance.

    Option 2: Shut down the app during sync

    If option 1 is not viable, the only real other solution is to shut down the application causing load on the replica set, restart the sync and wait for it too complete. Depending on the amount of the data to be transferred, calculate with several hours.

    A personal note

    The oplog window problem is a well known one. While replica sets and sharded clusters are easy to set up with MongoDB, quite some knowledge and a bit of experience is needed to maintain them properly. Do not run something as important as a database with a complex setup without knowing the basics - in case Something Bad (tm) happens, it might well lead to a situation FUBAR.

    0 讨论(0)
  • 2021-02-08 18:20

    Add a fourth new node to the replica set. Once it has synced then reset the stale secondary.

    0 讨论(0)
  • 2021-02-08 18:40

    Another option (assuming primary has healthy data) is to simply delete the data in the secondary's mongo data folder and restart. This will cause it to sync back up to the primary as if you just added it to the replica set.

    0 讨论(0)
提交回复
热议问题