Replication means keeping a copy of the same data on multiple machines that are connected via a network. The main difficulty in replication lies in handling changes to replication data.
A Simple Introduction about Distributed Data
Some reasons why you might distribute a database across multiple machines.
- Scalability:Data volume grows bigger than a single machine could handle.
- Fault tolerance/high availability:When one fails,another one can take over.
- Latency:Keep data geographically close to your users.
Scaling to Higher Load
An simple way is to buy a powerful machine,which is called vertical scaling.But the problem of this shared-memory architecture is that the cost grows faster than linearly. Another approach is the shared-disk architecture,uses several machines with independent CPUs and RAMs,this is used for some data warehousing workloads.
In this approach,each machine or virtual machine running the database software is called a node.Any coordination between nodes is done at the software level, using a conventional network.
Two common ways data is distributed across multiple nodes:
- Replication:Keeping a copy of the same data on several different nodes, potentially in different locations,which provides redundancy.