Question 1: Minimizing backup storage cost
Jeff wants to backup data on his machine. But he is concerned about the storage cost of the backups. He plans to do an initial full backup followed by daily incremental backups. His friend Raj argues that he should instead daily full backups and use a deduplicated storage system to store the backups. Whose solution would lead to lower storage cost of the backups?
Solution
Both Jeff and Raj propose good strategies and would have similar backup storage costs, but Raj's solution would have slightly lower costs.
Jeff's strategy of incremental backups will eliminate backup for files that haven't changed between two backups. So his strategy would not require storing copies of files that haven't changed.
However, even for files that have changed (which Jeff's strategy will backup), a lot of content might not have changed. As an example, think of a text file which has been appended to. The file would have changed and be backed up in Jeff's strategy, even though most of the content of the file is unchanged.
On the other hand, Raj's strategy of using a deduplicated storage would find duplicated content across backups and across files. So if a file has not changed between two backups, it's content would not be stored again. This achieves the same level of storage cost savings as Jeff's strategy. Additionally, even if a file has changed, a deduplicated storage system would identify the unchanged content in the file and not store that again. This is where Raj's strategy would lead to even lower storage costs than Jeff's.
You can check out couple of interesting related papers here and here.