rembrembdocs

Prepare MongoDB for RDI

Prepare MongoDB databases to work with RDI

This guide describes the steps required to prepare a MongoDB database as a source for Redis Data Integration (RDI) pipelines.

Prerequisites

Note:

The MongoDB connector is not capable of monitoring the changes of a standalone MongoDB server, since standalone servers do not have an oplog. The connector will work if the standalone server is converted to a replica set with one member.

Summary

The following table summarizes the considerations to prepare a MongoDB database for RDI.

Requirement

Description

MongoDB Topology

Replica Set, Sharded Cluster, or MongoDB Atlas

User Roles

readAnyDatabase, clusterMonitor

Oplog

Sufficient size for snapshot and streaming

Pre/Post Images

Enable on collections only if using a custom key

Connection String

Must include all hosts, replicaSet (if applicable), authSource, credentials

MongoDB Atlas

SSL required, provide root CA as SOURCE_DB_CACERT secret in RDI

Network

RDI Collector must reach all MongoDB nodes on required ports

The following checklist shows the steps to prepare a MongoDB database for RDI, with links to the sections that explain the steps in full detail. You may find it helpful to track your progress with the checklist as you complete each step.

1. Configure oplog size

The Debezium MongoDB connector relies on the oplog to capture changes from a replica set. The oplog is a fixed-size, capped collection. When it reaches its maximum size, it overwrites the oldest entries. If the connector is stopped and restarted, it attempts to resume from its last recorded position in the oplog. If that position has been overwritten, the connector may fail to start and report an invalid resume token error.

To prevent this, ensure the oplog retains enough history for Debezium to resume streaming after interruptions. You can do this by:

For detailed guidance, see the Debezium oplog configuration documentation.

2. Create a MongoDB user for RDI

Create a user with the following roles on the source database:

Example:

use admin;
db.createUser({
  user: "rdi_user",
  pwd: "rdi_password",
  roles: [
     // You can have multiple read roles, one per database.
    { role: "read", db: "your_database" },
    // Use the role below if you don't want to grant the `read` role for each database.
    // { role: "readAnyDatabase", db: "admin" },
    { role: "clusterMonitor", db: "admin" }
  ]
});

3. Connection string format

The RDI Collector requires a MongoDB connection string that includes all relevant hosts and authentication details.

Example (Replica Set):

mongodb://${SOURCE_DB_USERNAME}:${SOURCE_DB_PASSWORD}@host1:27017,host2:27017,host3:27017/?replicaSet=rs0&authSource=admin

Example (Sharded Cluster):

mongodb://${SOURCE_DB_USERNAME}:${SOURCE_DB_PASSWORD}@host:30000

4. Enable change streams and pre/post images (only if using a custom key)

Change Streams are required only if you are using a custom key in your RDI pipeline. Change streams are available by default on replica sets, sharded clusters, and MongoDB Atlas.

If your RDI pipeline uses a custom key, you must enable pre- and post-images on the relevant collections to capture the document state before and after updates or deletes. This allows RDI to access both the previous and updated versions of documents during change events, ensuring accurate synchronization.

Use the command below to enable change streams and pre/post images:

db.runCommand({
  collMod: "your_collection",
  changeStreamPreAndPostImages: { enabled: true }
});

5. MongoDB Atlas specific requirements

MongoDB Atlas only supports secure connections via SSL. The root CA certificate for MongoDB Atlas must be added as a SOURCE_DB_CACERT secret in RDI.

Example connection string for Atlas:

mongodb+srv://${SOURCE_DB_USERNAME}:${SOURCE_DB_PASSWORD}@cluster0.mongodb.net/?authSource=admin

6. Network and security

7. Configuration is complete

Once you have followed the steps above, your MongoDB database is ready for Debezium to use.

See also

On this page