After Hours Academic

Question 8: Gaming duration

Phil is building a gaming app. One of the things he needs to track are a user's stats from all their previous gaming sessions. Stats include things like duration of play, score, level, etc. Phil wants to be able to show summary of the statistics for each user (e.g., average duration) as well as across all users of the app. He is trying to decide whether to store the stats in a row-based DBMS (e.g., Postgres, MySQL) or a column-based DBMS (e.g., AWS Redshift). What would you suggest?

Solution

Phil should use a columnar database. Columnar databases arrange data on disk in a column-first format. That is, all entries from a column are stored contiguously, followed by all entries from another column, and so on. This is in contrast to row-based databases that store data in a row-first format (all entries from one row after the another).

Column-first storage makes reads for all entries in a column faster (because they are contiguous on disk). This is the access pattern that Phil would have for summarizing the stats (e.g., average gaming duration).

AWS Redshift documentation has a good primer on columnar storage. If you want a more thorough and academic treatment, this paper is a good starting point.

#computer-science #databases #qna