Question 12: Diverging replicas
Anjali is building an replicated database management system (DBMS). She wants to maintain two copies of data in two servers (primary and secondary server).
She decides to implement replication by replicating any page in the primary server to the secondary server.
As an example, consider that the DBMS client performs the following query:
UPDATE table_name
SET column1=val1, column7=val7
WHERE columns4=val4
If 10 rows satisfy column4=val4
, this query will update two columns for those 10 rows. These rows might be spread across 3 different pages. Anjali's design will then replicate all these pages to the secondary server.
Anjali thinks that she can make replication more efficient. Instead of replicating the pages, she decides to replicate the query and let each of the server operate on the query independently. So, in the above example, instead of replicating the three pages, the primary server will just tell the secondary server about the query and then both of them will perform the updates independently.
Can you think of a scenario wherein the two replicas might deviate with Anjali's new replication design?
Hint: What kind of queries can lead to different result on different servers?
Solution
Anjali's replication design can lead to divergent replicas if any of the queries have a non-deterministic input. For example, a query that stores time in the database using the server's clock could lead to divergent replicas because clocks are not guaranteed to be synchronized between two servers. Similarly, a query that stores a random value generated on the server could lead to divergent copies.