Byzantine fault tolerance

2025-05-13

I have been reading the classic consensus papers (Paxos, FLP) recently and the next one in line is about Byzantine Fault Tolerance (BFT). The goal of this class of algorithms is the same -- have all replicas of a state machine reach consensus (typically about the next input). The difference is in the fault model. BFT algorithms assume the existence of malicious nodes, known as Byzantine failures.

The first paper on this topic is from Lamport, Shostak, and Pease which proved the impossibility of consensus with less than 3F+1 replicas in the presence of F failures. The other seminal paper on the topic is from Miguel Castro and Barbara Liskov, titled "Practical Byzantine Fault Tolerance" which provided an algorithm to reach consensus with 3F+1 replicas in realistic scenarios and with low overhead.

I am not going to go into the details of either of the paper. Instead, I will just talk about two high level ideas: (i) why do we need 3F+1 replicas; and (ii) sketch of how to reach consensus with 3F+1 replicas.

The need for `3F+1` replicas

Consider that we need N total replicas. Given that there can be F failures, we need the algorithm to be able to proceed with only N-F replicas communicating with each other. Now within the N-F replicas that communicate, F might be malicious (Byzantine). So we need the remaining N-F replicas to have a majority of non-faulty replicas for correctness. That is (N-F)/2 > F, or N > 3F.

Note that the algorithm assumes a total of F faults (Byzantine or otherwise) and the above derivation actually considers a scenario with 2F failures -- F replicas don't respond (non-Byzantine failure) and among the ones that respond, F are malicious (Byzantine). This is a stronger setup than the assumed failure model. The reason for the stronger model is because it is being used to derive an impossibility result in an asynchronous system model (recall that in an asynchronous system model, messages can be arbitrarily delayed). In particular, having an algorithm with N <= 3F that tries to reach consensus after hearing from N-F = 2F replicas and assuming that the F non-responding replicas are the faulty ones would be incorrect because the F "non-responding" replicas might actually be functioning correctly and their messages might just be delayed. So the algorithm needs to assume that there could be F failures amongst the replicas it heard from.

Sketch of reaching consensus with `3F+1` replicas

The above derivation of the impossibility result also provides a sketch of the an actual byzantine fault tolerance algorithm. The high-level idea is for each replica to hear from 2F+1 replicas (including itself) that agree on a value before declaring that value to be the agreed upon value.

Meta thoughts

I really liked the two BFT papers found them intuitive and easy to follow. Based on reading Murat Demirbas' blog post about BFT, it appears that BFT algorithms were mostly just of theoretical importance for a long time until blockchains made them practically relevant. This makes me want to read up about blockchains soon...

#computer-science #consensus #distributed-systems #fault-tolerance #state-machine-replication

Byzantine fault tolerance

The need for 3F+1 replicas

Sketch of reaching consensus with 3F+1 replicas

Meta thoughts

The need for `3F+1` replicas

Sketch of reaching consensus with `3F+1` replicas