Question 3: High tail latency for reads from an SSD
Jenny buys a brand new state-of-the-art SSD for his gaming desktop. Before using it for gaming though, she decides to run benchmark the performance of the SSD to make sure that it is up to the mark. She chooses a workload with a 50/50 mix of random reads and random writes. She let's the workload run overnight and analyzes the data in the morning. One of the things that stands out is that the tail latency of the reads is very high. While the median latency is in 10s of microseconds, the tail latency is close to 100s of milliseconds.
What might have caused such high tail latency?
Solution
The reason Jenny is seeing high tail latency for reads is that reads (which take 10s of microseconds) can get queued behind writes (100s of milliseconds) or erasures (single digit seconds) in SSDs.
SSDs consist of multiple concurrently operating planes. However, within a place, operations (reads, writes, erasures) are serialized. If a read gets stuck behind a write or an erasure on a place, the end-to-end latency for that read would be much higher than a typical read. This manifests as the high read tail latency.
Tiny-Tail Flash is a good read on this topic. Interruptible writes and erasures are another solution to this problem. Last I had checked, there were no papers talking about interruptible erasures, only patents, but a recent search showed this paper (I haven't read it yet though).