Question 18: Challenge snapshot
Ram is an avid runner and has built an app for his runner friends to challenge each other. Any user can send a challenge to any other user using the app. For fault tolerance reasons, he wants to take periodic snapshots of the state of his app. That is, he wants to a consistent view of all the send and received challenges as well as any in-flight challenges.
The challenge (pun intended) he is facing in getting a snapshot is that while each user's app can snapshot their local state, he is not sure how to guarantee a globally consistent snapshot. For example, consider that Ram sends a challenge to Shyam. If Ram takes a snapshot after sending the challenge and Shyam takes a snapshot after receiving the challenge, the snapshot would imply that Ram send Shyam two challenge (out of which one is in-flight in the snapshot).
Can you suggest a mechanism to get a globally consistent snapshot?
Solution
Ram can implement the consistent snapshot algorithm described in this seminal paper.
The high-level idea is that each node takes a snapshot of its own state and sends a special marker message to all other nodes to notify them. When a node receives a marker message, it takes a snapshot if it hasn't already. This ensure that there are no in-flight messages between the two nodes. (This requires ordered messages, which can be implemented independently.) If a node that receives a marker had already taken its snapshot, it considers all the messages received between its snapshot and receiving the marker message as in-flight messages in the snapshot.
Highly recommend reading the paper to get a much better description and explanation of the algorithm!