It works like this. Imagine the sky full of birds. Take pictures until you got all the birds.
Place the pictures on the table. Map pictures over each other. So you see every bird one time. Do you se every bird? Ok. Then you know, at that time. The system was stable.
Record what all the birds sounds like(messages) and take some more pictures. Then repeat.
If you have a node split. Go back to the latest common stable snapshot. And try** to replay what append after that. :)
It's better described in
"Distributed Snapshots: Determining Global States of Distributed Systems"
K. MANI CHANDY and LESLIE LAMPORT
** I think there are a problem deciding who's clock to go after when trying to replay what happend