DDIA note part 5


Posted by Jqy on December 24, 2019


Replication means keeping a copy of the same data on multiple machines that are connected via a network. The main difficulty in replication lies in handling changes to replication data.

A Simple Introduction about Distributed Data

Some reasons why you might distribute a database across multiple machines.

  • Scalability:Data volume grows bigger than a single machine could handle.
  • Fault tolerance/high availability:When one fails,another one can take over.
  • Latency:Keep data geographically close to your users.

Scaling to Higher Load

An simple way is to buy a powerful machine,which is called vertical scaling.But the problem of this shared-memory architecture is that the cost grows faster than linearly. Another approach is the shared-disk architecture,uses several machines with independent CPUs and RAMs,this is used for some data warehousing workloads.

Shared-Nothing Architectures

In this approach,each machine or virtual machine running the database software is called a node.Any coordination between nodes is done at the software level, using a conventional network.

Two common ways data is distributed across multiple nodes:

  • Replication:Keeping a copy of the same data on several different nodes, potentially in different locations,which provides redundancy.
  • Partitioning