DDIA note part 3

Storage and Retrieval

Posted by Jqy on December 18, 2019


How we can store the data we’re given,and how we find it again when asked?

Data Structures That Power Your Database

Assume there is a very simple database,composed of key and value,and the underlying storage format is just a text file where each line contains a k-v pair.Each time when we write is just to append data to the end of the file,so the old data would not be overwritten.

AntiIntuitively,this database has pretty performance in some scenes,as appending to a file is very efficient.In many pratical databases internally use a log,which is an append-only data file. On the other hand,we have terrible performance if there are large number of records in the database,for example,when looking up a key.

In order to efficiently accelerate the process of looking up,we need a data structure:index,to keep some additional metadata on the side.

Maintaining the index only affect the performance of queries,and for write,it’s hard to beat the performance of simply appending to a file.

Therefore,this is an important trade-off in storage system,and databases don’t use index everything by default,but require users,to choose indexes manually.

Hash Indexes

The simplest possible indexing strategy:keep an in-memory hash map where every key is mapped to a byte offset in the data file.Whenever you append a new kv pair,you also update the hash map.


If there are a lot of writes,but there are not too many distinct keys——you have a large number of writes per key,it’s feasible to keep all keys in memory.

As we only ever append to a file,how do we avoid eventually running out of a disk space?On solution is to break the log into segments of a certain size,when it reaches this size, make subsequent writes to a new segment file.

We can perform compaction on these segments,throwing away duplicate key and only keep the most recent update for each key.


We can also merge several segments together at the same time as performing the compaction.ddia_0303

Segments are nevermodified after they have been written, so the merged segment is written to a new file.The merging and compaction of segments can be done in a background thread, and while it is going on, we can still continue to serve read requests using the old segment files.After the merging process is complete, we switch read requests to using the new merged segment instead of the old segments—and then the old segment files can simply be deleted.

Lots of detail goes into making this simple idea work in practice:

  • File format:
  • Deleting records:
  • Crash recovery:
  • Partially written records:
  • Concurrency control