After Hours Academic

Question 16: Silent data corruption

Naina has built her own storage system for storing her extensive music and movie collection. She has stored two copies of her data on two disks. While she is happy about the level of redundancy, she is concerned about silent data corruptions. There are a lot of movies in her storage system and many would not be played for years altogether. She is concerned about the unused data from those movies getting silently corrupted from both the disks, rendering the replication useless.

Can you suggest a mechanism that Naina can employ to increase her confidence in the reliability of her storage system?

Solution coming up in the next post!


Solution for too many videos:

Daphne should to deduplicate her storage, i.e., instead of storing duplicate copies of the shared videos, she should share a single copy and have multiple references to it.

One way to incorporate deduplication would be to use a two level key-value design. The first level key-value store uses the same key but points to another key in the second level key-value store. The second level key-value store has a key derived from the video (e.g., a SHA-256 hash of the video file) and points to the actual video. This way duplicated videos will be stored only once while allowing for per-user video listings.

Venti is one of the seminal papers that described deduplicated storage.

#qna #storage-systems