After Hours Academic

Question 2: Log-structured versus copy-on-write

Bill argues that log-structured storage systems are just a special case of copy-on-write storage systems. He says that the only difference is in where the new data is written to. Is he correct?

Solution coming up in the next post!


Solution to minimizing backup storage cost:

Both Jeff and Raj propose good strategies and would have similar backup storage costs, but Raj's solution would have slightly lower costs.

Jeff's strategy of incremental backups will eliminate backup for files that haven't changed between two backups. So his strategy would not require storing copies of files that haven't changed.

However, even for files that have changed (which Jeff's strategy will backup), a lot of content might not have changed. As an example, think of a text file which has been appended to. The file would have changed and be backed up in Jeff's strategy, even though most of the content of the file is unchanged.

On the other hand, Raj's strategy of using a deduplicated storage would find duplicated content across backups and across files. So if a file has not changed between two backups, it's content would not be stored again. This achieves the same level of storage cost savings as Jeff's strategy. Additionally, even if a file has changed, a deduplicated storage system would identify the unchanged content in the file and not store that again. This is where Raj's strategy would lead to even lower storage costs than Jeff's.

You can check out couple of interesting related papers here and here.

#copy-on-write #log-structured #qna #storage-systems