Question 10: Uncontended contention

2024-07-08

Mitch is building a distributed system wherein each process reads some data from a database, performs some computations on it and writes it back. Most processes operate on separate parts of the data and do not compete with other processes for the same data. Mitch does not want to use locks because it will only add the overhead of acquiring and releasing locks even though there is no actual contention in the common case (different processes operating on different parts of the data). However, he still needs to ensure that if two processes do happen to work on the same piece of data, only one of them succeeds with the final write. Can you suggest a mechanism to help Mitch with this problem?

Solution coming up in the next post!

Solution for slow scanning:

Gloria can build a separate index for the metadata and store it independently instead of storing the metadata with each page. For example, if she uses a simple array based structure to store the metadata for each page, she can scan that array to get the metadata. Presumably the metadata is small, so metadata for multiple pages (arranged as an array) would fit in a single page. Thus scanning through the metadata would be more efficient because it would involved fewer cache misses (at various levels in the memory-storage hierarchy: reading pages from disk, bringing cache lines to processor cache, TLB lookups).

Another mechanism that Gloria can use is to use larger page sizes (e.g., hugepages). This would reduce the total amount of data that the background thread has to read/process (because there would be fewer pages).

Both the mechanisms can be used in conjunction as well.

#qna