Explanation of MongoDB Write Concern and Read Concern

Posted on  Jan 11, 2024  by  Amo Chen  ‐ 8 min read

Understanding MongoDB Write Concern and Read Concern is essential when working with MongoDB cluster environments. Lack of familiarity with these two crucial concepts may lead to unexpected operations and even result in bugs.

This article aims to introduce MongoDB Write Concern and Read Concern in an easy-to-understand manner.

Environment

  • MongoDB 6.0

Distributed Transactions

Before diving into MongoDB’s Write Concern and Read Concern, it’s necessary to understand what a “Distributed Transaction” is.

In MongoDB, an operation on a single document is atomic.

First off, MongoDB ensures that operations on a single document are atomic, meaning that regarding the ACID principle, MongoDB operations on a single document are either fully successful or fully failed, without partial success/failure.

e.g. If you’re unfamiliar with ACID, please refer to Backend Engineer Interview: ACID.

This leads to a question: what can be done if we need to operate on two documents simultaneously, since there is no atomicity guaranteed?

Take a top-up process as an example. When a user tops up, we might operate on two pieces of data in the MongoDB database. One records the top-up, while the other updates the user’s current total top-up amount:

// JavaScript

const db = client.db("mydb")
const userId = "xyzdef"
const amount = 100
const time = new Date().getTime()

// insert the record
db.collection("deposit_records").insertOne(
  { userId, amount, time },
);

// update the total amount
db.collection("users").findOneAndUpdate(
  { userId, },
  { $inc: { amount, },},
);

If MongoDB unfortunately fails during the second data operation, it might result in the user seeing a top-up record, but the current available top-up amount not being updated due to MongoDB’s assurance of atomicity only for single document operations.

To address such problems, as of MongoDB 4.0, Distributed Transactions, or Multi-document Transactions, enable atomic transactions across multiple documents or databases within a MongoDB cluster.

This means you can use Transactions to ensure that the previously mentioned top-up process is either entirely successful or fails in its entirety, avoiding partial success.

Here’s a sample JavaScript code using MongoDB Transactions, where await session.withTransaction is the key part of using Transactions:

const { MongoClient } = require('mongodb');

// connection URL
const url = 'mongodb://mongo01:27021,mongo02:27022,mongo03:27023/?replicaSet=rs0&readPreference=secondary';
const client = new MongoClient(url);

// database Name
const dbName = 'mydb';

async function main() {
  await client.connect();
  const db = client.db(dbName);

  const userId = 'xyzdef';
  const amount = 100;
  const time = new Date().getTime();

  const session = client.startSession();
  const transactionOptions = {
    readPreference: 'primary',
    readConcern: { level: 'local' },
    writeConcern: { w: 'majority' },
  };
  try {
    await session.withTransaction(async () => {
      // Important:: You must pass the session to the operations
      await db.collection('deposit_records').insertOne({ userId, amount, time }, { session });
      await db.collection('users').findOneAndUpdate({ userId }, { $inc: { amount } }, { session });
    }, transactionOptions);
  } finally {
    await session.endSession();
    await client.close();
  }
  return 'done.';
}

main()
  .then(console.log)
  .catch(console.error)
  .finally(() => client.close());

However, using Multi-document Transactions requires enabling MongoDB replica sets or sharded clusters.

e.g. For differences between replication and sharding, refer to What Is Replication In MongoDB?. In brief, replication involves copying data to replicas, while sharding distributes data across different nodes, with each node only holding a portion of the data; collectively, they form the entire dataset.

Write Concern

Having a preliminary understanding of the origin and function of Multi-document Transactions can make Write Concern easier to comprehend.

Write concern describes the level of acknowledgment requested from MongoDB for write operations to a standalone mongodReplica sets, or sharded clusters.

Let’s explain Write Concern. Simply put, it’s MongoDB’s strategy/method for responding to write operations.

For example, in a single MongoDB scenario, when we issue a request to write a piece of data, we can almost expect an immediate response from MongoDB indicating a successful write. However, in a multi-MongoDB cluster setup, when issuing a write request, when should MongoDB respond with a success message? Is it after any node writes successfully, or must more than 50% of the MongoDB nodes successfully write the data before responding?

This strategy/method for acknowledging write operations is known as Write Concern.

A diagram might illustrate this more clearly:

write-concern.png

In the above diagram, {w: 2, wtimeout: 500} is a MongoDB Write Concern setting. w: 2 indicates MongoDB will respond once two nodes apply the write operation, while wtimeout: 500 sets a timeout to prevent blocking write operations after a specified milliseconds.

There are three fields available for Write Concern settings:

{ w: <value>, j: <boolean>, wtimeout: <number> }

Besides numbers, w can be a string majority (e.g., { w: "majority" }). Majority is MongoDB 5.0’s default setting, where MongoDB automatically calculates the w value. For more details, see Implicit Default Write Concern.

The j field represents a stricter response strategy/method, where MongoDB needs data to be written back to disk before responding – not just kept in memory.

wtimeout defines the timeout limit and is effective only when w is greater than 1 (>1).

Read Concern

After understanding Write Concern, Read Concern is another key area to understand in MongoDB.

Based on the four important ACID principles, we know Transactions offer atomicity. Read Concern, in particular, addresses the ACID principles of Consistency and Isolation.

The readConcern option allows you to control the consistency and isolation properties of the data read from replica sets and replica set shards.

What does Read Concern offer?

Here are two examples:

  1. Provides a Read Your Own Writes guarantee.
  2. Solves Dirty Read issues.

Read Your Own Writes is a consistency guarantee in a distributed system. In a Primary/Secondary database structure, when we write data (create, update, delete) to the Primary and then attempt to read this data, reading from a Secondary might cause failure in finding the data because the Secondary node has yet to sync the write. As shown below:

read_your_write.png

Read Your Own Writes ensures that you can read the data you just wrote.

Now let’s discuss Dirty Read.

Suppose before data is copied from the Primary to a Secondary, we can read it from the Primary. If, for some reason like a failed Transaction, it gets rolled back at the Primary, this creates a “ghost” data situation, also known as “Dirty Read.” Such read issues undermine data consistency and may cause unexpected program results, like miscalculations due to dirty reads.

To address these issues, databases provide various levels of settings to offer different degrees of guarantees. MongoDB provides five Read Concern levels:

  1. local
  2. available
  3. majority
  4. linearizable
  5. snapshot

Out of the above, only local, majority, and snapshot can be used with multi-document transactions.

Here’s how to use Read Concern in practice:

db.collection("users")\
  .find({ userId: 'xyz' })\
  .readConcern("majority");

Below are brief descriptions of the five Read Concern levels. Due to their complexity, it’s recommended to thoroughly read the official documentation before using them.

e.g. Generally, understanding just local and majority is enough to handle most cases. For more, see Causal Consistency and Read and Write Concerns which includes a chart summarizing the recommended Write and Read Concern settings for various issues.

e.g. After MongoDB 5.0, you don’t need to set it up to use Read Concern. For versions earlier than 5.0, see enableMajorityReadConcern.

Read Concern: local

Local is the default Read Concern level. The result of a query execution does not ensure data has been written to the majority of nodes and does not prevent data from being rolled back, thus allowing for possible dirty reads by default.

Read Concern: available

Very similar to local, though it offers the lowest read latency. It neither ensures data written to the majority of nodes nor prevents rollback, and it presents a risk of reading orphaned documents.

Orphaned documents are documents that exist concurrently across different shards in a sharded cluster environment. Typically, a document should only exist on a single shard. Reasons they might appear in multiple shards include incomplete data cleanup post-migration or unexpected shutdowns during migration.

If a collection isn’t sharded, available behaves like local.

Read Concern: majority

Majority ensures data has been written to the majority of nodes, preventing rollbacks of read data. It doesn’t, however, guarantee the latest data.

To use majority, you must be using MongoDB’s WiredTiger storage engine, the default from MongoDB 3.2 onwards.

Read Concern: linearizable

Similar to majority, linearizable confirms reading the latest data and might even wait for data to be written to the majority of nodes before returning the query result. Queries are only sent to the primary node, and $out and $merge stages are not supported.

This setting has a higher query cost, as it requires additional confirmation with secondary nodes.

Read Concern: snapshot

Reads data from the latest snapshot. This option is limited to Multi-document Transactions and find, aggregate, distinct read operations.

Conclusion

When discussing data consistency in a distributed MongoDB environment, Write Concern and Read Concern are two pivotal concepts. They ensure the reliability and consistency of write and read operations, respectively.

Write Concern controls the level of acknowledgment for data written to MongoDB. On the other hand, Read Concern determines the consistency level for read operations, such as guaranteeing that read operations reflect data confirmed by the majority of nodes.

To sum up, if you are working with a MongoDB cluster, it’s crucial to understand Read Concern and Write Concern!

That’s all!

Enjoy!

References

MongoDB - Transactions

What Is Replication In MongoDB?

MongoDB - Read Concern

The difference between “majority” and “linearizable”

Causal Consistency and Read and Write Concerns