Tasks on NoSQL and NewSQL systems

1. Context

This document contains thoughts on self-study tasks for a text on NoSQL and NewSQL systems.

2. Quorum Replication with N=3

2.1. W=2, R=1

Suppose W=2 and R=1 where a read operation comes in after a write operation took place.

There are 3 replicas, and a write operation was acknowledged by 2 replicas. Given R=1, a read operation may be answered either by one of the updated replicas or by the third one which did not perform the write yet. Thus, an outdated value may or may not be produced.

2.2. W=2, R=2

How does the situation change for R=2?

Now at least one updated replica will be read. Thus, outdated return values can be avoided. To decide which value to return, vector clocks can be used. The corresponding vector timestamps indicate how many operations were initiated at what replicas and which version is stored at what replica: For each write, the coordinator generates new timestamp, which is stored with the new value at the replicas. Here, each read value comes with that timestamp, allowing the coordinator to choose the value with the largest timestamp as return value. (Also, vector timestamps allow to detect conflicting values, which might require additional compensating actions.)

3. CAP

If a network partition is ongoing and conflicting updates are supposed to take place, one has to choose between availability and consistency. In that situation an otherwise strongly consistent system might fall back to eventual consistency.

4. F1

Redundancy is key for availability. F1 is based on synchronous replication to multiple data centers with redundant network infrastructure. Availability is given as long as a quorum majority exists (minorities shut down).