After Hours Academic

Question 15: Too many videos

Daphne has built a video sharing app. A user can create and upload their videos on the app. Users can also share videos with other users. Her app allows each user to view all the videos accessible to them, the ones that the user created as well as the ones that someone else has shared with the user. She uses a key-value store in the backend to implement this library. Whenever a new video shows up for a user (either self created or shared), the app stores the video in the key-value store using the key {userID}-{monotonically-increasing-per-user-video-count}.

She noticed that the storage used by the key-value store is bloating up. She suspects that it is because of storing duplicate videos for each user for shared videos. Can you suggest a mechanism to avoid the storage bloat?

Solution coming up in the next post!


Solution for fastest finger first:

Saurabh's implementation of finding the student who finished the quiz quickest is incorrect because the timestamp is generated on the student's machines. Time between any two machines is never guaranteed to be synchronized. Using unsynchronized clocks means that an earlier timestamp does not actually guarantee that the event actually happened before other events with a later timestamp (from a different clock).

One of the approaches that Saurabh can use is to generate a timestamp at the central server (because that would be coming from a single clock). However, this approach is susceptible to message delivery delays (messages from the students' machines to the central server).

In general, there is no way to know the ordering between two events that happen on two different machines in a distributed system without any sort of communication between the machines. Lamport's classic paper is an excellent read on this topic.

#qna #storage-systems