Question 9: Slow scanning
Gloria is building a single-node database management system (DBMS). Her DBMS maintains some metadata for each page at the top of the page. Periodically, a background thread in the DBMS goes through all the pages and reads this metadata and takes some action based on it. She would like to reduce the amount of time it takes for her DBMS to go through all the pages. Can you suggest some mechanisms to help with that?
Solution
Gloria can build a separate index for the metadata and store it independently instead of storing the metadata with each page. For example, if she uses a simple array based structure to store the metadata for each page, she can scan that array to get the metadata. Presumably the metadata is small, so metadata for multiple pages (arranged as an array) would fit in a single page. Thus scanning through the metadata would be more efficient because it would involved fewer cache misses (at various levels in the memory-storage hierarchy: reading pages from disk, bringing cache lines to processor cache, TLB lookups).
Another mechanism that Gloria can use is to use larger page sizes (e.g., hugepages). This would reduce the total amount of data that the background thread has to read/process (because there would be fewer pages).
Both the mechanisms can be used in conjunction as well.