After Hours Academic

Question 19: Scalable file creation

Anant and Radhika are arguing whether file creations within a directory can scale with the number of thread. That is, if n threads create a file each (all in the same directory but with different names) in the same time that it takes one thread to create a file. Anant argues that since all the files are being created in the same directory, the threads need to serialize access to the directory and hence file creation cannot scale. Radhika thinks that file creation can scale if only they can come up with a data structure to represent the directory that allows for concurrent file creations.

Who do you think is correct?

Solution

File creation in the same directory can indeed be scaled. One way it can be done is by storing the directory as a hash table with keys being the file names and values being the inode number for the file's inode. By using key-level locking, file creations for files with different names can easily scale. The reason why this seems counterintuitive is because we conventionally think of file system directory as a shared resource. However, that is a design/implementation choice and not a fundamental property.

I lifted this example from the scalable commutativity paper. This paper is an excellent read and provides a way to reason about what operations can or cannot scale purely based on the interface (and not the implementation).

#computer-science #file-systems #qna #storage-systems