MongoDB HA Production | Replica Sets, Sharding, K8s Operators

MongoDB occupies a unique position in the database landscape. Its document model maps naturally to application objects, its flexible schema accommodates evolving data structures without migrations, and its built-in replication and sharding primitives provide a foundation for high availability and horizontal scaling that relational databases require external tooling to achieve. But a foundation is not a finished building. Running MongoDB in production with genuine high availability — where a node failure, a network partition, or an entire region going offline does not result in downtime or data loss — demands deliberate architecture, careful tuning, and rigorous operational discipline.

This guide walks through every layer of MongoDB HA: from the replica set consensus protocol and oplog replication that power automatic failover, through the sharded cluster architecture that enables horizontal scaling, to the Kubernetes operators that automate lifecycle management, and across the deployment options on AWS, Azure, GCP, and bare metal k3s with Rancher. Every section includes concrete configuration, YAML manifests, and operational procedures you can adapt to your environment.

MongoDB Replica Set Architecture

A replica set is MongoDB's fundamental unit of high availability. It is a group of mongod processes that maintain the same data set. One member is the primary, which receives all write operations. The remaining members are secondaries, which replicate data from the primary by tailing its operation log (oplog). If the primary becomes unavailable, the replica set holds an election to choose a new primary from the eligible secondaries — typically within 10 to 12 seconds.

A production replica set should have at least three data-bearing members, ideally spread across different failure domains (availability zones, racks, or data centres). This ensures that the replica set can survive the loss of any single member and still maintain a majority for election purposes. An optional arbiter participates in elections but holds no data — it exists solely to break ties when you have an even number of data-bearing members, though MongoDB best practice is to use an odd number of data-bearing members instead.

Oplog and Replication Mechanics

The oplog is a capped collection (local.oplog.rs) that records every data-modifying operation on the primary in idempotent form. Secondaries continuously tail the primary's oplog and apply operations locally. The oplog size determines how far behind a secondary can fall before it needs a full resync — for production workloads, size the oplog to hold at least 24 to 72 hours of write activity. MongoDB 4.4+ supports dynamic oplog sizing via replSetResizeOplog.

# Check current oplog size and window
rs.printReplicationInfo()

# Resize the oplog to 50 GB
db.adminCommand({ replSetResizeOplog: 1, size: 51200 })

# Check replication lag on secondaries
rs.printSecondaryReplicationInfo()

Elections and the Raft-Based Protocol

MongoDB 4.0+ uses a Raft-inspired consensus protocol for replica set elections. When a secondary detects that the primary is unreachable (default electionTimeoutMillis of 10,000ms), it may call an election. To win, a candidate must receive votes from a majority of the voting members. The member with the most recent oplog entry and the highest priority wins if multiple candidates are eligible. You can influence election outcomes by setting member priorities — a member with priority: 0 can never become primary, which is useful for analytics replicas or members in remote regions.

# Initiate a 3-member replica set
rs.initiate({
  _id: "rs-production",
  members: [
    { _id: 0, host: "mongo-0.mongo-svc:27017", priority: 10 },
    { _id: 1, host: "mongo-1.mongo-svc:27017", priority: 5 },
    { _id: 2, host: "mongo-2.mongo-svc:27017", priority: 5 }
  ],
  settings: {
    electionTimeoutMillis: 10000,
    heartbeatTimeoutSecs: 10,
    chainingAllowed: true
  }
})

# Check replica set status
rs.status()

# Step down the primary (for maintenance)
rs.stepDown(60)   // step down for 60 seconds

# Force reconfiguration (emergency)
rs.reconfig(newConfig, { force: true })

Read Preference and Write Concern

Read Preference controls where the driver sends read operations. The options are:

primary — All reads go to the primary. Strongest consistency but no read scaling.
primaryPreferred — Reads go to the primary unless it is unavailable, then to a secondary.
secondary — All reads go to secondaries. Provides read scaling but may return stale data.
secondaryPreferred — Reads go to secondaries unless none are available.
nearest — Reads go to the member with the lowest network latency regardless of role. Best for geo-distributed deployments.

Write Concern controls how many replica set members must acknowledge a write before the operation returns to the client.

w: 1 — Only the primary must acknowledge. Fastest but risks data loss if the primary fails before replication.
w: "majority" — A majority of data-bearing members must acknowledge. This is the recommended production default. It guarantees the write survives a primary election.
w: <number> — A specific number of members must acknowledge.
j: true — The write must be committed to the on-disk journal before acknowledgement. Combined with w: "majority", this provides the strongest durability guarantee.

# Connection string with write concern and read preference
mongodb://mongo-0:27017,mongo-1:27017,mongo-2:27017/appdb?replicaSet=rs-production&w=majority&j=true&readPreference=secondaryPreferred&readPreferenceTags=region:us-east

# SRV connection string (DNS-based discovery)
mongodb+srv://appuser:password@cluster.example.com/appdb?w=majority&retryWrites=true&readPreference=nearest

MongoDB Sharded Cluster Architecture

A replica set provides high availability but not horizontal write scaling — all writes go to a single primary. When your data set exceeds the capacity of a single server or your write throughput exceeds what one primary can handle, you need sharding. A sharded cluster distributes data across multiple replica sets (shards) using a shard key, enabling horizontal scaling of both storage and write throughput.

A sharded cluster has three component types. mongos routers are stateless query routers that direct client operations to the appropriate shard(s). Deploy at least two for redundancy. Config servers form a replica set that stores the cluster metadata — which chunks live on which shards, the shard key ranges, and balancer state. Shard servers are replica sets that each hold a subset of the sharded data.

Shard Key Selection

The shard key is the most consequential decision in a sharded cluster. It determines how data is distributed across shards and directly impacts query performance, write distribution, and the ability to scale. A good shard key has high cardinality (many distinct values), distributes writes evenly across shards, and supports the most common query patterns with targeted operations rather than scatter-gather.

# Enable sharding on a database
sh.enableSharding("appdb")

# Shard a collection with a hashed shard key (even distribution)
sh.shardCollection("appdb.events", { "event_id": "hashed" })

# Shard with a ranged shard key (supports range queries)
sh.shardCollection("appdb.orders", { "customer_id": 1, "order_date": 1 })

# Check shard distribution
db.orders.getShardDistribution()

# View chunk distribution across shards
use config
db.chunks.aggregate([
  { $group: { _id: "$shard", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])

Common shard key strategies include: hashed keys for uniform write distribution (best when you don't need range queries on the shard key), compound keys that combine a coarse grouping field with a high-cardinality field (e.g., { tenant_id: 1, _id: 1 } for multi-tenant applications), and zone-based keys that align data placement with geographic regions.

Zone Sharding for Multi-Region

Zone sharding constrains specific ranges of the shard key to specific shards, enabling data locality. For example, you can ensure that European customer data lives on shards in the EU region while US customer data lives on shards in the US region.

# Add shards to zones
sh.addShardTag("shard-us-east", "US")
sh.addShardTag("shard-eu-west", "EU")
sh.addShardTag("shard-ap-south", "APAC")

# Define zone ranges
sh.addTagRange("appdb.customers",
  { "region": "US", "customer_id": MinKey },
  { "region": "US", "customer_id": MaxKey },
  "US"
)
sh.addTagRange("appdb.customers",
  { "region": "EU", "customer_id": MinKey },
  { "region": "EU", "customer_id": MaxKey },
  "EU"
)
sh.addTagRange("appdb.customers",
  { "region": "APAC", "customer_id": MinKey },
  { "region": "APAC", "customer_id": MaxKey },
  "APAC"
)

# Verify zone configuration
sh.status()

Multi-Region MongoDB Deployment

Distributing MongoDB across multiple regions serves two purposes: disaster recovery (surviving the loss of an entire region) and latency optimization (serving reads from the nearest replica). MongoDB supports multi-region deployments through replica set members distributed across regions, zone sharding for data locality, and read preference configuration that routes reads to the nearest member.

In a five-member replica set distributed across three regions (2 in primary region, 2 in DR region, 1 in read region), the loss of the primary region still leaves three members available — enough for a majority to elect a new primary. The member in the read region should have priority: 0 to prevent it from becoming primary (high cross-region latency would degrade write performance). Use hidden: true for analytics-dedicated members that should not receive regular application reads.

MongoDB Community Kubernetes Operator

The MongoDB Community Kubernetes Operator deploys and manages MongoDB replica sets on Kubernetes. It is the open-source operator from MongoDB Inc. that handles StatefulSet management, automated replica set configuration, TLS certificate rotation, user management, and rolling upgrades.

# Install the MongoDB Community Operator via Helm
helm repo add mongodb https://mongodb.github.io/helm-charts
helm repo update

helm install community-operator mongodb/community-operator \
  --namespace mongodb \
  --create-namespace \
  --set operator.watchNamespace="*"

MongoDBCommunity CRD Specification

The MongoDBCommunity custom resource defines the desired state of a MongoDB replica set. Below is a production-ready specification.

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: production-mongodb
  namespace: databases
spec:
  members: 3
  type: ReplicaSet
  version: "7.0.12"

  security:
    authentication:
      modes: ["SCRAM"]
    tls:
      enabled: true
      certificateKeySecretRef:
        name: mongodb-tls-cert
      caCertificateSecretRef:
        name: mongodb-ca-cert

  users:
    - name: appuser
      db: admin
      passwordSecretRef:
        name: mongodb-appuser-password
      roles:
        - name: readWrite
          db: appdb
        - name: clusterMonitor
          db: admin
      scramCredentialsSecretName: appuser-scram
    - name: backup-user
      db: admin
      passwordSecretRef:
        name: mongodb-backup-password
      roles:
        - name: backup
          db: admin
        - name: restore
          db: admin
      scramCredentialsSecretName: backup-scram
    - name: monitoring
      db: admin
      passwordSecretRef:
        name: mongodb-monitoring-password
      roles:
        - name: clusterMonitor
          db: admin
      scramCredentialsSecretName: monitoring-scram

  additionalMongodConfig:
    storage.wiredTiger.engineConfig.cacheSizeGB: 4
    storage.wiredTiger.engineConfig.journalCompressor: snappy
    storage.wiredTiger.collectionConfig.blockCompressor: snappy
    net.maxIncomingConnections: 10000
    operationProfiling.mode: slowOp
    operationProfiling.slowOpThresholdMs: 100
    replication.oplogSizeMB: 51200
    setParameter.cursorTimeoutMillis: 600000

  statefulSet:
    spec:
      template:
        spec:
          containers:
            - name: mongod
              resources:
                requests:
                  cpu: "2"
                  memory: 8Gi
                limits:
                  cpu: "4"
                  memory: 16Gi
            - name: mongodb-agent
              resources:
                requests:
                  cpu: 250m
                  memory: 256Mi
                limits:
                  cpu: 500m
                  memory: 512Mi
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                - topologyKey: topology.kubernetes.io/zone
                  labelSelector:
                    matchLabels:
                      app: production-mongodb-svc
          tolerations:
            - key: "workload"
              operator: "Equal"
              value: "database"
              effect: "NoSchedule"
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            storageClassName: gp3-csi
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 100Gi
        - metadata:
            name: logs-volume
          spec:
            storageClassName: gp3-csi
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 10Gi

This specification creates a three-member replica set running MongoDB 7.0 with SCRAM authentication, TLS encryption, separate data and log volumes, pod anti-affinity across availability zones, and WiredTiger tuning appropriate for a node with 16 GB RAM. The operator handles replica set initialization, member configuration, and rolling upgrades when you change the version field.

Percona Server for MongoDB Operator

The Percona Operator for MongoDB (PSMDB Operator) provides a more feature-rich alternative to the Community Operator. It deploys Percona Server for MongoDB (a drop-in replacement for MongoDB with additional enterprise features), manages sharded clusters as well as replica sets, integrates backup via Percona Backup for MongoDB (PBM), and supports point-in-time recovery.

# Install Percona Operator
helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update

helm install psmdb-operator percona/psmdb-operator \
  --namespace psmdb \
  --create-namespace

# Percona Server for MongoDB Cluster CRD
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: production-psmdb
  namespace: databases
spec:
  crVersion: "1.16.0"
  image: percona/percona-server-mongodb:7.0.12-7
  imagePullPolicy: IfNotPresent

  replsets:
    - name: rs0
      size: 3
      resources:
        requests:
          cpu: "2"
          memory: 8Gi
        limits:
          cpu: "4"
          memory: 16Gi
      volumeSpec:
        persistentVolumeClaim:
          storageClassName: gp3-csi
          accessModes: [ReadWriteOnce]
          resources:
            requests:
              storage: 100Gi
      nonvoting:
        enabled: false
      arbiter:
        enabled: false
      configuration: |
        storage:
          wiredTiger:
            engineConfig:
              cacheSizeGB: 4
              journalCompressor: snappy
            collectionConfig:
              blockCompressor: snappy
        operationProfiling:
          mode: slowOp
          slowOpThresholdMs: 100
        replication:
          oplogSizeMB: 51200
      affinity:
        antiAffinityTopologyKey: topology.kubernetes.io/zone

  sharding:
    enabled: false

  mongos: {}
  configsrv: {}

  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.5.0
    storages:
      s3-backup:
        type: s3
        s3:
          bucket: company-mongodb-backups
          region: us-east-1
          credentialsSecret: aws-s3-credentials
          prefix: production
          insecureSkipTLSVerify: false
    pitr:
      enabled: true
      oplogOnly: false
      compressionType: gzip
    tasks:
      - name: daily-full
        enabled: true
        schedule: "0 3 * * *"
        keep: 7
        storageName: s3-backup
        compressionType: gzip

  secrets:
    users: mongodb-users-secret

  pmm:
    enabled: true
    image: percona/pmm-client:2
    serverHost: pmm-server.monitoring

The Percona Operator's key advantage is its integrated backup management with Percona Backup for MongoDB (PBM). PBM supports logical and physical backups, incremental backups, and point-in-time recovery from the oplog — all configured declaratively through the CRD.

AWS Deployment: DocumentDB vs Atlas vs Self-Managed on EKS

Amazon DocumentDB is a MongoDB-compatible document database service. It is not MongoDB — it is a proprietary engine that implements the MongoDB wire protocol (compatible up to MongoDB 4.0 API). DocumentDB separates compute from storage using a distributed storage layer similar to Aurora. It provides automatic failover within a region, up to 15 read replicas, and point-in-time recovery. However, it lacks many MongoDB features: change streams have limitations, transactions work differently, and many aggregation pipeline stages are unsupported. Use DocumentDB only if your application uses a subset of MongoDB's API and you value the operational simplicity of a fully managed service.

MongoDB Atlas on AWS is MongoDB's own managed service running on AWS infrastructure. It provides genuine MongoDB with all features, automated HA, continuous backups, point-in-time recovery, auto-scaling, and multi-region clusters. Atlas is the easiest path to production MongoDB but the most expensive option at scale.

Self-Managed on EKS gives you full control over MongoDB version, configuration, and cost. Use the MongoDB Community Operator or Percona Operator with EBS gp3 storage and IAM Roles for Service Accounts (IRSA) for secure S3 backup access.

# EBS StorageClass optimised for MongoDB
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-csi
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "6000"
  throughput: "250"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

# IRSA for backup S3 access
eksctl create iamserviceaccount \
  --name mongodb-backup-sa \
  --namespace databases \
  --cluster my-eks-cluster \
  --attach-policy-arn arn:aws:iam::111122223333:policy/MongoDBBackupS3Policy \
  --approve

Azure Deployment: Cosmos DB vs Atlas vs Self-Managed on AKS

Azure Cosmos DB for MongoDB (vCore) is the closest Azure native offering to real MongoDB. Unlike the older RU-based Cosmos DB API for MongoDB, the vCore model runs actual MongoDB engine instances on dedicated compute, providing high compatibility with MongoDB 6.0+ features including full aggregation pipeline, change streams, and transactions. It offers zone-redundant HA, point-in-time recovery, and automatic backups.

MongoDB Atlas on Azure provides the same fully managed MongoDB experience as on AWS, running on Azure infrastructure with VNET peering, Azure Private Link, and Azure AD integration.

Self-Managed on AKS uses Azure Managed Disks (Premium SSD v2 recommended) with the MongoDB or Percona operator.

# Azure Premium SSD StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-premium-mongodb
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS
  cachingMode: None
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

GCP Deployment: Atlas on GCP vs Self-Managed on GKE

MongoDB Atlas on GCP runs on Google Cloud infrastructure with GCP-native integrations: VPC peering, Private Service Connect, and GKE cluster integration. Atlas on GCP supports multi-region clusters spanning GCP regions with automatic failover.

Self-Managed on GKE uses Persistent Disk SSD with the MongoDB or Percona operator. GKE Workload Identity provides secure, keyless authentication for backup to GCS.

# GKE SSD StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-ssd-mongodb
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

Bare Metal k3s/Rancher with Longhorn

For data sovereignty, compliance, or cost optimisation, MongoDB runs effectively on bare metal Kubernetes using k3s with Rancher management and Longhorn distributed storage. This architecture eliminates cloud provider dependencies while maintaining the same operator-based management model.

Longhorn Storage for MongoDB

# Install Longhorn
helm repo add longhorn https://charts.longhorn.io
helm repo update

helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultReplicaCount=3 \
  --set defaultSettings.storageMinimalAvailablePercentage=15

# StorageClass for MongoDB on Longhorn
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-mongo
provisioner: driver.longhorn.io
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  dataLocality: best-effort
  fsType: xfs
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

XFS is recommended over ext4 for MongoDB on Linux. MongoDB's WiredTiger storage engine benefits from XFS's allocation patterns, particularly for the journal and data files. Longhorn's three-way replication provides volume-level redundancy on top of MongoDB's replica set-level redundancy, giving you defence in depth against storage failures.

MetalLB and Headless Services

# MetalLB IP Pool for MongoDB
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: mongo-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.220-192.168.1.225
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: mongo-l2
  namespace: metallb-system
spec:
  ipAddressPools:
    - mongo-pool

The MongoDB operator creates a headless Service that gives each pod a stable DNS name (mongo-0.mongo-svc.databases.svc.cluster.local). This is essential for replica set member discovery. If you need external access, MetalLB assigns a routable IP to a LoadBalancer Service in front of the mongos routers (for sharded clusters) or the primary (for replica sets).

Backup Strategies

MongoDB offers multiple backup approaches, each suited to different scenarios.

mongodump / mongorestore

Logical backups that export BSON documents. Portable and human-inspectable, but slow for large datasets and don't support point-in-time recovery on their own.

# Full logical backup with oplog for consistency
mongodump --uri="mongodb://backup-user:password@mongo-0:27017,mongo-1:27017,mongo-2:27017/appdb?replicaSet=rs-production&authSource=admin" \
  --oplog \
  --gzip \
  --out=/backups/$(date +%Y%m%d-%H%M%S)

# Restore
mongorestore --uri="mongodb://admin:password@mongo-0:27017/?replicaSet=rs-production&authSource=admin" \
  --oplogReplay \
  --gzip \
  /backups/20260412-030000/

Percona Backup for MongoDB (PBM)

PBM provides physical backups, incremental backups, and point-in-time recovery. It is the recommended backup tool for self-managed MongoDB in production.

# Configure PBM storage
pbm config --set storage.type=s3 \
  --set storage.s3.bucket=company-mongodb-backups \
  --set storage.s3.region=us-east-1 \
  --set storage.s3.credentials.access-key-id=$AWS_ACCESS_KEY \
  --set storage.s3.credentials.secret-access-key=$AWS_SECRET_KEY

# Full backup
pbm backup --type=logical --compression=gzip

# Physical backup (faster, requires WiredTiger)
pbm backup --type=physical --compression=gzip

# Incremental backup
pbm backup --type=incremental --base-snapshot=2026-04-12T03:00:00Z

# Point-in-time recovery
pbm restore --time="2026-04-12T09:30:00Z"

# List backups
pbm list

# Check backup status
pbm status

Cloud Snapshots

On cloud providers, EBS snapshots (AWS), managed disk snapshots (Azure), and persistent disk snapshots (GCP) provide fast, storage-level backups. Combined with db.fsyncLock() for point-in-time consistency, they offer the fastest backup and restore times for large datasets.

# Kubernetes VolumeSnapshot for MongoDB
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: mongodb-snapshot-$(date +%Y%m%d)
  namespace: databases
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: data-volume-production-mongodb-0

Oplog-Based Point-in-Time Recovery

The oplog enables point-in-time recovery (PITR) when combined with a base backup. PBM continuously archives oplog entries to the backup storage. To restore to a specific point in time, PBM restores the most recent base backup and then replays oplog entries up to the target timestamp. This is configured in the Percona Operator CRD via the pitr section.

Connection String Configuration for HA

Proper connection string configuration is critical for application resilience during failover events.

# Standard connection string with all replica set members
mongodb://appuser:password@mongo-0.mongo-svc:27017,mongo-1.mongo-svc:27017,mongo-2.mongo-svc:27017/appdb?replicaSet=rs-production&w=majority&j=true&readPreference=secondaryPreferred&retryWrites=true&retryReads=true&connectTimeoutMS=10000&socketTimeoutMS=30000&serverSelectionTimeoutMS=15000&maxPoolSize=100&minPoolSize=10

# SRV-based connection string (DNS discovery)
mongodb+srv://appuser:password@mongo-cluster.databases.svc.cluster.local/appdb?w=majority&retryWrites=true&readPreference=nearest

# For sharded clusters (connect through mongos)
mongodb://appuser:password@mongos-0:27017,mongos-1:27017/appdb?w=majority&retryWrites=true&readPreference=nearest

Key connection string parameters for HA: retryWrites=true and retryReads=true enable automatic retry of operations that fail during failover. w=majority ensures writes survive primary elections. serverSelectionTimeoutMS controls how long the driver waits to find a suitable server — set it higher than the expected election time (at least 15 seconds). maxPoolSize limits the connection pool per mongos/replica set member to prevent connection exhaustion.

Index Optimization and Query Performance

Indexes are the primary lever for MongoDB query performance. A missing index on a frequently queried field forces a collection scan that degrades linearly with data size.

# Create compound index for common query pattern
db.orders.createIndex(
  { customer_id: 1, order_date: -1, status: 1 },
  { name: "idx_customer_orders", background: true }
)

# Partial index (only index documents matching a filter)
db.events.createIndex(
  { timestamp: 1 },
  { name: "idx_active_events", partialFilterExpression: { status: "active" } }
)

# TTL index for automatic document expiration
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 86400, name: "idx_session_ttl" }
)

# Text index for search
db.products.createIndex(
  { name: "text", description: "text" },
  { weights: { name: 10, description: 5 }, name: "idx_product_search" }
)

# Analyze query performance
db.orders.find({ customer_id: "c123" }).sort({ order_date: -1 }).explain("executionStats")

# Find unused indexes
db.orders.aggregate([ { $indexStats: {} } ])

Review $indexStats regularly to identify unused indexes that waste storage and slow writes. Use the explain() method to verify that queries use the expected index and check for high totalDocsExamined relative to nReturned, which indicates an inefficient query plan.

WiredTiger Storage Engine Tuning

WiredTiger is MongoDB's default and only production storage engine since MongoDB 4.2. Its performance characteristics are heavily influenced by cache sizing, compression, and journal configuration.

# WiredTiger configuration in mongod.conf
storage:
  dbPath: /data/db
  journal:
    enabled: true
    commitIntervalMs: 100
  wiredTiger:
    engineConfig:
      cacheSizeGB: 4            # ~50% of (RAM - 1GB), max 80%
      journalCompressor: snappy
      directoryForIndexes: true  # separate dir for index files
    collectionConfig:
      blockCompressor: snappy   # or zstd for better ratio
    indexConfig:
      prefixCompression: true

operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 100

replication:
  oplogSizeMB: 51200            # 50 GB oplog
  replSetName: rs-production

net:
  maxIncomingConnections: 10000
  compression:
    compressors: snappy,zstd,zlib

setParameter:
  wiredTigerConcurrentReadTransactions: 128
  wiredTigerConcurrentWriteTransactions: 128

The WiredTiger cache should be sized to hold your working set — the data and indexes actively accessed by your queries. If the cache is too small, WiredTiger evicts pages frequently, causing high I/O. If it is too large, it leaves insufficient memory for the OS filesystem cache and other processes. A starting point is 50% of available RAM minus 1 GB (for the OS and other processes), capped at the working set size.

Authentication and TLS Encryption

# Generate TLS certificates for MongoDB
# CA certificate
openssl req -x509 -newkey rsa:4096 -days 3650 -nodes \
  -keyout ca.key -out ca.crt \
  -subj "/CN=MongoDB-CA"

# Server certificate (include all member hostnames in SAN)
openssl req -newkey rsa:4096 -nodes \
  -keyout server.key -out server.csr \
  -subj "/CN=mongo-0.mongo-svc.databases.svc.cluster.local" \
  -addext "subjectAltName=DNS:mongo-0.mongo-svc.databases.svc.cluster.local,DNS:mongo-1.mongo-svc.databases.svc.cluster.local,DNS:mongo-2.mongo-svc.databases.svc.cluster.local,DNS:localhost,IP:127.0.0.1"

openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
  -CAcreateserial -out server.crt -days 365

# Combine cert and key into PEM
cat server.crt server.key > server.pem

# Create Kubernetes secrets
kubectl create secret tls mongodb-tls-cert \
  --cert=server.crt --key=server.key -n databases

kubectl create secret generic mongodb-ca-cert \
  --from-file=ca.crt=ca.crt -n databases

# MongoDB TLS configuration (mongod.conf)
net:
  tls:
    mode: requireTLS
    certificateKeyFile: /etc/mongodb/tls/server.pem
    CAFile: /etc/mongodb/tls/ca.crt
    allowConnectionsWithoutCertificates: false

security:
  authorization: enabled
  clusterAuthMode: x509

Monitoring with Prometheus and Grafana

MongoDB exposes metrics through the mongodb_exporter (from Percona) that integrates with Prometheus. Key metrics to monitor include connection counts, operation rates, replication lag, WiredTiger cache usage, and query targeting efficiency.

# Deploy mongodb_exporter as a sidecar or standalone
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mongodb-exporter
  namespace: databases
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mongodb-exporter
  template:
    metadata:
      labels:
        app: mongodb-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9216"
    spec:
      containers:
        - name: exporter
          image: percona/mongodb_exporter:0.40.0
          args:
            - --mongodb.uri=mongodb://monitoring:password@production-mongodb-0.production-mongodb-svc:27017,production-mongodb-1.production-mongodb-svc:27017,production-mongodb-2.production-mongodb-svc:27017/?replicaSet=rs-production&authSource=admin
            - --collect-all
            - --compatible-mode
          ports:
            - containerPort: 9216
          resources:
            requests:
              cpu: 100m
              memory: 128Mi

# ServiceMonitor for Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mongodb-metrics
  namespace: databases
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: mongodb-exporter
  endpoints:
    - port: metrics
      interval: 15s
      scrapeTimeout: 10s

# Critical Prometheus alert rules for MongoDB
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: mongodb-alerts
  namespace: databases
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: mongodb-health
      rules:
        - alert: MongoDBReplicationLagHigh
          expr: mongodb_mongod_replset_member_replication_lag > 30
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "MongoDB replica {{ $labels.name }} lag exceeds 30s"

        - alert: MongoDBConnectionsHigh
          expr: mongodb_connections{state="current"} / mongodb_connections{state="available"} > 0.8
          for: 2m
          labels:
            severity: critical
          annotations:
            summary: "MongoDB connections above 80% capacity"

        - alert: MongoDBWiredTigerCacheEvictions
          expr: rate(mongodb_wiredtiger_cache_evicted_pages_total[5m]) > 100
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "High WiredTiger cache eviction rate"

        - alert: MongoDBReplicaSetNoPrimary
          expr: mongodb_mongod_replset_number_of_members{state="PRIMARY"} == 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: "MongoDB replica set has no primary"

        - alert: MongoDBQueryTargetingInefficient
          expr: rate(mongodb_mongod_metrics_query_executor_total{state="scanned_objects"}[5m]) / rate(mongodb_mongod_metrics_query_executor_total{state="returned"}[5m]) > 100
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "MongoDB scanning 100x more documents than returned"

Change Streams for Real-Time Applications

Change streams provide a real-time notification mechanism for data changes in MongoDB. They leverage the oplog to push change events to applications, enabling event-driven architectures, real-time dashboards, and data synchronisation pipelines without polling.

// Watch changes on a collection
const pipeline = [
  { $match: { operationType: { $in: ["insert", "update", "replace"] } } },
  { $match: { "fullDocument.status": "active" } }
];

const changeStream = db.collection("orders").watch(pipeline, {
  fullDocument: "updateLookup",  // include full document on updates
  resumeAfter: resumeToken       // resume from last processed event
});

changeStream.on("change", (change) => {
  console.log("Change detected:", change.operationType);
  console.log("Document:", change.fullDocument);
  // Store resume token for crash recovery
  saveResumeToken(change._id);
});

changeStream.on("error", (error) => {
  console.error("Change stream error:", error);
  // Reconnect using saved resume token
});

Change streams require a replica set or sharded cluster (they do not work on standalone mongod instances). They survive primary elections — the driver automatically reconnects and resumes from the last received resume token. For production use, always persist the resume token so your application can recover from restarts without missing events.

Rolling Maintenance and Version Upgrades

MongoDB supports rolling upgrades where you upgrade one replica set member at a time, starting with secondaries and finishing with the primary (which triggers a step-down and election). This allows zero-downtime upgrades for minor and major version changes.

# Rolling upgrade procedure for self-managed replica set
# 1. Upgrade each secondary one at a time
# On secondary (mongo-2):
sudo systemctl stop mongod
sudo yum install -y mongodb-org-7.0.14  # or apt-get
sudo systemctl start mongod
# Wait for the member to reach SECONDARY state
rs.status()

# 2. Step down the primary
rs.stepDown()

# 3. Upgrade the old primary (now a secondary)
sudo systemctl stop mongod
sudo yum install -y mongodb-org-7.0.14
sudo systemctl start mongod

# 4. Verify cluster health
rs.status()
db.adminCommand({ getParameter: 1, featureCompatibilityVersion: 1 })

# 5. Set feature compatibility version (irreversible)
db.adminCommand({ setFeatureCompatibilityVersion: "7.0" })

With the MongoDB Community Operator or Percona Operator on Kubernetes, rolling upgrades are automated — you simply update the version field in the CRD and the operator handles the rolling update of each pod, waiting for each member to become healthy before proceeding.

Disaster Recovery and Failover Testing

A high availability deployment that has never been tested under failure is untested. Regular failover testing validates your architecture, your monitoring alerts, and your team's incident response procedures.

Controlled Failover Tests

# Test 1: Step down the primary
rs.stepDown(120)  // 120-second election hold
// Monitor: election should complete in ~10-12s
// Verify: application reconnects and resumes operations

# Test 2: Kill a secondary pod in Kubernetes
kubectl delete pod production-mongodb-1 -n databases --grace-period=0 --force
// The StatefulSet recreates the pod
// The operator rejoins it to the replica set

# Test 3: Simulate network partition
kubectl exec production-mongodb-0 -n databases -- \
  iptables -A INPUT -p tcp --dport 27017 -j DROP
// The isolated primary should step down (cannot reach majority)
// Remaining members elect a new primary
// Cleanup: remove iptables rule and let member rejoin

# Test 4: Simulate storage failure
kubectl exec production-mongodb-2 -n databases -- \
  chmod 000 /data/db
// mongod should crash; Kubernetes restarts the pod
// The member resyncs from the primary's oplog

What to Measure During Failover

RTO (Recovery Time Objective) — Time from primary failure to new primary accepting writes. Target: under 30 seconds for replica sets.
RPO (Recovery Point Objective) — Data loss during failover. With w: majority, RPO is zero for acknowledged writes. With w: 1, RPO equals the replication lag at the time of failure.
Application error rate — Percentage of requests that fail during the failover window. With retryWrites=true, most write failures are automatically retried by the driver.
Change stream continuity — Verify that change stream consumers resume from their saved token without missing events.

Disaster Recovery Runbook

Single member failure — Automatic recovery via Kubernetes pod restart and oplog catch-up. No manual action required unless the member needs a full resync.
Primary failure — Automatic election promotes a secondary within 10-12 seconds. Verify application connectivity and replication lag on remaining secondaries.
Majority failure — If a majority of members are down, the replica set becomes read-only (no elections possible). Restore members or use rs.reconfig({ force: true }) as a last resort (this can cause data loss).
Complete cluster loss — Deploy a new cluster, restore from the latest PBM backup, and replay oplog to the target point in time. Update connection strings and DNS records.
Regional failover — If the primary region is lost, a secondary in another region is automatically elected primary (if it has sufficient priority and the remaining members form a majority). Update DNS to route traffic to the new primary region.

Production Tuning Recommendations

OS-Level Tuning

# Disable Transparent Huge Pages (critical for MongoDB)
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

# Set readahead to 8-32 sectors for SSD
blockdev --setra 32 /dev/sda

# Increase file descriptor limits
ulimit -n 64000
ulimit -u 64000

# Swappiness
vm.swappiness = 1

# Dirty page ratio
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5

# Network tuning
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_time = 120

Resource Sizing Guidelines

WiredTiger cache: 50% of available RAM minus 1 GB, or the size of your working set, whichever is smaller.
Oplog size: Enough to hold 24-72 hours of write operations. Start with 50 GB and monitor with rs.printReplicationInfo().
Storage IOPS: MongoDB is I/O-intensive, especially during compaction and checkpointing. Use NVMe SSD or cloud storage with provisioned IOPS (gp3 with 6000+ IOPS on AWS, Premium SSD v2 on Azure, pd-ssd on GCP).
CPU: MongoDB benefits from multiple cores for concurrent read/write operations, WiredTiger's concurrent transactions, and background tasks (checkpointing, compaction, replication).
Network: Replication traffic can be significant for write-heavy workloads. Ensure low-latency, high-bandwidth networking between replica set members.

Connection Management

# Application-side connection pool configuration
const client = new MongoClient(uri, {
  maxPoolSize: 100,
  minPoolSize: 10,
  maxIdleTimeMS: 60000,
  waitQueueTimeoutMS: 5000,
  connectTimeoutMS: 10000,
  socketTimeoutMS: 30000,
  serverSelectionTimeoutMS: 15000,
  retryWrites: true,
  retryReads: true,
  w: "majority",
  readPreference: "secondaryPreferred",
  compressors: ["snappy", "zstd"]
});

Monitoring Checklist

Replication lag — Alert when any secondary exceeds 30 seconds of lag.
Connection saturation — Alert when current connections exceed 80% of maxIncomingConnections.
WiredTiger cache — Alert when the cache dirty fill ratio exceeds 20% (indicates write pressure exceeding checkpoint throughput).
Oplog window — Alert when the oplog window drops below 12 hours (risk of secondaries needing full resync after maintenance).
Query targeting — Alert when the ratio of scanned documents to returned documents exceeds 100 (missing index).
Disk usage — Alert at 70% and 85% thresholds. MongoDB can use significant temporary disk space during compaction.
Backup freshness — Alert when the last successful backup is older than your RPO window.
Ticket availability — Monitor WiredTiger read and write tickets. Exhaustion causes operation queuing and latency spikes.

Operational Commands Quick Reference

# Replica set status
rs.status()
rs.conf()
rs.printReplicationInfo()
rs.printSecondaryReplicationInfo()

# Cluster health (sharded)
sh.status()
db.adminCommand({ balancerStatus: 1 })
db.adminCommand({ listShards: 1 })

# Server diagnostics
db.serverStatus()
db.currentOp({ "$all": true })
db.adminCommand({ hostInfo: 1 })

# Kill long-running operations
db.killOp(opId)

# Compaction (reclaim disk space)
db.runCommand({ compact: "orders" })

# Profiler (identify slow queries)
db.setProfilingLevel(1, { slowms: 100 })
db.system.profile.find().sort({ ts: -1 }).limit(10)

# Index management
db.orders.getIndexes()
db.orders.createIndex({ field: 1 }, { background: true })
db.orders.dropIndex("index_name")

# Kubernetes-specific
kubectl get mongodbcommunity -n databases
kubectl describe mongodbcommunity production-mongodb -n databases
kubectl logs production-mongodb-0 -n databases -c mongod
kubectl exec -it production-mongodb-0 -n databases -c mongod -- mongosh

Decision Matrix: Choosing Your MongoDB HA Architecture

Single cloud, managed preference — Use MongoDB Atlas. It handles HA, backups, monitoring, and scaling with minimal operational overhead.
AWS with MongoDB-compatible needs only — Consider Amazon DocumentDB if your application uses a basic subset of the MongoDB API and you want a fully managed experience. Otherwise, Atlas or self-managed on EKS.
Azure with full MongoDB features — Use Cosmos DB for MongoDB vCore for a managed experience, or Atlas on Azure for the genuine MongoDB managed service.
Multi-cloud or hybrid — Self-managed with the MongoDB Community Operator or Percona Operator. The Kubernetes abstraction enables consistent deployment across providers.
Bare metal or edge — k3s + Rancher + Longhorn + MongoDB Community Operator or Percona Operator. No cloud dependencies required.
Large-scale horizontal scaling — Sharded cluster with zone sharding for data locality. Use the Percona Operator which supports sharded deployments natively.
Real-time event-driven applications — MongoDB change streams provide built-in CDC. Ensure you use a replica set or sharded cluster (not standalone).

Conclusion

MongoDB's built-in replication and sharding primitives give it an architectural advantage for high availability — replica sets with automatic failover and sharded clusters with horizontal scaling are native capabilities, not bolted-on afterthoughts. But these primitives must be configured correctly and operated with discipline to deliver the availability guarantees that production systems demand.

A production MongoDB deployment requires three data-bearing replica set members across failure domains, w: majority write concern for durability, proper oplog sizing for replication resilience, WiredTiger cache tuning for your working set, and a tested backup strategy with point-in-time recovery capability. Kubernetes operators — whether the MongoDB Community Operator for replica sets or the Percona Operator for full-featured deployments including sharding and integrated backups — automate the lifecycle management that would otherwise require significant operational investment.

The deployment patterns across AWS, Azure, GCP, and bare metal share the same core MongoDB configuration. What changes is the storage class, the backup destination, and the networking layer. This consistency is the value of an operator-based approach: your team learns one tool, one operational model, and one set of runbooks that work everywhere.

Start with a three-member replica set, w: majority writes, a daily PBM backup with continuous oplog archiving, and the core Prometheus alerts for replication lag, connection saturation, and cache pressure. Test your failover on day one — not during your first incident. Expand to sharding, zone-based data locality, and multi-region deployments as your data volume and availability requirements grow. The infrastructure handles the mechanics; your responsibility is to understand the architecture deeply enough to make the right trade-offs for your workload and to test those assumptions relentlessly.

MongoDB Replica Set Architecture

Oplog and Replication Mechanics

# Check current oplog size and window
rs.printReplicationInfo()

# Resize the oplog to 50 GB
db.adminCommand({ replSetResizeOplog: 1, size: 51200 })

# Check replication lag on secondaries
rs.printSecondaryReplicationInfo()

Elections and the Raft-Based Protocol

# Initiate a 3-member replica set
rs.initiate({
  _id: "rs-production",
  members: [
    { _id: 0, host: "mongo-0.mongo-svc:27017", priority: 10 },
    { _id: 1, host: "mongo-1.mongo-svc:27017", priority: 5 },
    { _id: 2, host: "mongo-2.mongo-svc:27017", priority: 5 }
  ],
  settings: {
    electionTimeoutMillis: 10000,
    heartbeatTimeoutSecs: 10,
    chainingAllowed: true
  }
})

# Check replica set status
rs.status()

# Step down the primary (for maintenance)
rs.stepDown(60)   // step down for 60 seconds

# Force reconfiguration (emergency)
rs.reconfig(newConfig, { force: true })

Read Preference and Write Concern

Read Preference controls where the driver sends read operations. The options are:

primary — All reads go to the primary. Strongest consistency but no read scaling.
primaryPreferred — Reads go to the primary unless it is unavailable, then to a secondary.
secondary — All reads go to secondaries. Provides read scaling but may return stale data.
secondaryPreferred — Reads go to secondaries unless none are available.
nearest — Reads go to the member with the lowest network latency regardless of role. Best for geo-distributed deployments.

Write Concern controls how many replica set members must acknowledge a write before the operation returns to the client.

w: 1 — Only the primary must acknowledge. Fastest but risks data loss if the primary fails before replication.
w: "majority" — A majority of data-bearing members must acknowledge. This is the recommended production default. It guarantees the write survives a primary election.
w: <number> — A specific number of members must acknowledge.
j: true — The write must be committed to the on-disk journal before acknowledgement. Combined with w: "majority", this provides the strongest durability guarantee.

# Connection string with write concern and read preference
mongodb://mongo-0:27017,mongo-1:27017,mongo-2:27017/appdb?replicaSet=rs-production&w=majority&j=true&readPreference=secondaryPreferred&readPreferenceTags=region:us-east

# SRV connection string (DNS-based discovery)
mongodb+srv://appuser:password@cluster.example.com/appdb?w=majority&retryWrites=true&readPreference=nearest

MongoDB Sharded Cluster Architecture

Shard Key Selection

# Enable sharding on a database
sh.enableSharding("appdb")

# Shard a collection with a hashed shard key (even distribution)
sh.shardCollection("appdb.events", { "event_id": "hashed" })

# Shard with a ranged shard key (supports range queries)
sh.shardCollection("appdb.orders", { "customer_id": 1, "order_date": 1 })

# Check shard distribution
db.orders.getShardDistribution()

# View chunk distribution across shards
use config
db.chunks.aggregate([
  { $group: { _id: "$shard", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])

Zone Sharding for Multi-Region

# Add shards to zones
sh.addShardTag("shard-us-east", "US")
sh.addShardTag("shard-eu-west", "EU")
sh.addShardTag("shard-ap-south", "APAC")

# Define zone ranges
sh.addTagRange("appdb.customers",
  { "region": "US", "customer_id": MinKey },
  { "region": "US", "customer_id": MaxKey },
  "US"
)
sh.addTagRange("appdb.customers",
  { "region": "EU", "customer_id": MinKey },
  { "region": "EU", "customer_id": MaxKey },
  "EU"
)
sh.addTagRange("appdb.customers",
  { "region": "APAC", "customer_id": MinKey },
  { "region": "APAC", "customer_id": MaxKey },
  "APAC"
)

# Verify zone configuration
sh.status()

Multi-Region MongoDB Deployment

MongoDB Community Kubernetes Operator

# Install the MongoDB Community Operator via Helm
helm repo add mongodb https://mongodb.github.io/helm-charts
helm repo update

helm install community-operator mongodb/community-operator \
  --namespace mongodb \
  --create-namespace \
  --set operator.watchNamespace="*"

MongoDBCommunity CRD Specification

The MongoDBCommunity custom resource defines the desired state of a MongoDB replica set. Below is a production-ready specification.

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: production-mongodb
  namespace: databases
spec:
  members: 3
  type: ReplicaSet
  version: "7.0.12"

  security:
    authentication:
      modes: ["SCRAM"]
    tls:
      enabled: true
      certificateKeySecretRef:
        name: mongodb-tls-cert
      caCertificateSecretRef:
        name: mongodb-ca-cert

  users:
    - name: appuser
      db: admin
      passwordSecretRef:
        name: mongodb-appuser-password
      roles:
        - name: readWrite
          db: appdb
        - name: clusterMonitor
          db: admin
      scramCredentialsSecretName: appuser-scram
    - name: backup-user
      db: admin
      passwordSecretRef:
        name: mongodb-backup-password
      roles:
        - name: backup
          db: admin
        - name: restore
          db: admin
      scramCredentialsSecretName: backup-scram
    - name: monitoring
      db: admin
      passwordSecretRef:
        name: mongodb-monitoring-password
      roles:
        - name: clusterMonitor
          db: admin
      scramCredentialsSecretName: monitoring-scram

  additionalMongodConfig:
    storage.wiredTiger.engineConfig.cacheSizeGB: 4
    storage.wiredTiger.engineConfig.journalCompressor: snappy
    storage.wiredTiger.collectionConfig.blockCompressor: snappy
    net.maxIncomingConnections: 10000
    operationProfiling.mode: slowOp
    operationProfiling.slowOpThresholdMs: 100
    replication.oplogSizeMB: 51200
    setParameter.cursorTimeoutMillis: 600000

  statefulSet:
    spec:
      template:
        spec:
          containers:
            - name: mongod
              resources:
                requests:
                  cpu: "2"
                  memory: 8Gi
                limits:
                  cpu: "4"
                  memory: 16Gi
            - name: mongodb-agent
              resources:
                requests:
                  cpu: 250m
                  memory: 256Mi
                limits:
                  cpu: 500m
                  memory: 512Mi
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                - topologyKey: topology.kubernetes.io/zone
                  labelSelector:
                    matchLabels:
                      app: production-mongodb-svc
          tolerations:
            - key: "workload"
              operator: "Equal"
              value: "database"
              effect: "NoSchedule"
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            storageClassName: gp3-csi
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 100Gi
        - metadata:
            name: logs-volume
          spec:
            storageClassName: gp3-csi
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 10Gi

Percona Server for MongoDB Operator

# Install Percona Operator
helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update

helm install psmdb-operator percona/psmdb-operator \
  --namespace psmdb \
  --create-namespace

# Percona Server for MongoDB Cluster CRD
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: production-psmdb
  namespace: databases
spec:
  crVersion: "1.16.0"
  image: percona/percona-server-mongodb:7.0.12-7
  imagePullPolicy: IfNotPresent

  replsets:
    - name: rs0
      size: 3
      resources:
        requests:
          cpu: "2"
          memory: 8Gi
        limits:
          cpu: "4"
          memory: 16Gi
      volumeSpec:
        persistentVolumeClaim:
          storageClassName: gp3-csi
          accessModes: [ReadWriteOnce]
          resources:
            requests:
              storage: 100Gi
      nonvoting:
        enabled: false
      arbiter:
        enabled: false
      configuration: |
        storage:
          wiredTiger:
            engineConfig:
              cacheSizeGB: 4
              journalCompressor: snappy
            collectionConfig:
              blockCompressor: snappy
        operationProfiling:
          mode: slowOp
          slowOpThresholdMs: 100
        replication:
          oplogSizeMB: 51200
      affinity:
        antiAffinityTopologyKey: topology.kubernetes.io/zone

  sharding:
    enabled: false

  mongos: {}
  configsrv: {}

  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.5.0
    storages:
      s3-backup:
        type: s3
        s3:
          bucket: company-mongodb-backups
          region: us-east-1
          credentialsSecret: aws-s3-credentials
          prefix: production
          insecureSkipTLSVerify: false
    pitr:
      enabled: true
      oplogOnly: false
      compressionType: gzip
    tasks:
      - name: daily-full
        enabled: true
        schedule: "0 3 * * *"
        keep: 7
        storageName: s3-backup
        compressionType: gzip

  secrets:
    users: mongodb-users-secret

  pmm:
    enabled: true
    image: percona/pmm-client:2
    serverHost: pmm-server.monitoring

AWS Deployment: DocumentDB vs Atlas vs Self-Managed on EKS

# EBS StorageClass optimised for MongoDB
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-csi
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "6000"
  throughput: "250"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

# IRSA for backup S3 access
eksctl create iamserviceaccount \
  --name mongodb-backup-sa \
  --namespace databases \
  --cluster my-eks-cluster \
  --attach-policy-arn arn:aws:iam::111122223333:policy/MongoDBBackupS3Policy \
  --approve

Azure Deployment: Cosmos DB vs Atlas vs Self-Managed on AKS

MongoDB Atlas on Azure provides the same fully managed MongoDB experience as on AWS, running on Azure infrastructure with VNET peering, Azure Private Link, and Azure AD integration.

Self-Managed on AKS uses Azure Managed Disks (Premium SSD v2 recommended) with the MongoDB or Percona operator.

# Azure Premium SSD StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-premium-mongodb
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS
  cachingMode: None
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

GCP Deployment: Atlas on GCP vs Self-Managed on GKE

Self-Managed on GKE uses Persistent Disk SSD with the MongoDB or Percona operator. GKE Workload Identity provides secure, keyless authentication for backup to GCS.

# GKE SSD StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-ssd-mongodb
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

Bare Metal k3s/Rancher with Longhorn

Longhorn Storage for MongoDB

# Install Longhorn
helm repo add longhorn https://charts.longhorn.io
helm repo update

helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultReplicaCount=3 \
  --set defaultSettings.storageMinimalAvailablePercentage=15

# StorageClass for MongoDB on Longhorn
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-mongo
provisioner: driver.longhorn.io
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  dataLocality: best-effort
  fsType: xfs
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

MetalLB and Headless Services

# MetalLB IP Pool for MongoDB
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: mongo-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.220-192.168.1.225
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: mongo-l2
  namespace: metallb-system
spec:
  ipAddressPools:
    - mongo-pool

Backup Strategies

MongoDB offers multiple backup approaches, each suited to different scenarios.

mongodump / mongorestore

Logical backups that export BSON documents. Portable and human-inspectable, but slow for large datasets and don't support point-in-time recovery on their own.

# Full logical backup with oplog for consistency
mongodump --uri="mongodb://backup-user:password@mongo-0:27017,mongo-1:27017,mongo-2:27017/appdb?replicaSet=rs-production&authSource=admin" \
  --oplog \
  --gzip \
  --out=/backups/$(date +%Y%m%d-%H%M%S)

# Restore
mongorestore --uri="mongodb://admin:password@mongo-0:27017/?replicaSet=rs-production&authSource=admin" \
  --oplogReplay \
  --gzip \
  /backups/20260412-030000/

Percona Backup for MongoDB (PBM)

PBM provides physical backups, incremental backups, and point-in-time recovery. It is the recommended backup tool for self-managed MongoDB in production.

# Configure PBM storage
pbm config --set storage.type=s3 \
  --set storage.s3.bucket=company-mongodb-backups \
  --set storage.s3.region=us-east-1 \
  --set storage.s3.credentials.access-key-id=$AWS_ACCESS_KEY \
  --set storage.s3.credentials.secret-access-key=$AWS_SECRET_KEY

# Full backup
pbm backup --type=logical --compression=gzip

# Physical backup (faster, requires WiredTiger)
pbm backup --type=physical --compression=gzip

# Incremental backup
pbm backup --type=incremental --base-snapshot=2026-04-12T03:00:00Z

# Point-in-time recovery
pbm restore --time="2026-04-12T09:30:00Z"

# List backups
pbm list

# Check backup status
pbm status

Cloud Snapshots

# Kubernetes VolumeSnapshot for MongoDB
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: mongodb-snapshot-$(date +%Y%m%d)
  namespace: databases
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: data-volume-production-mongodb-0

Oplog-Based Point-in-Time Recovery

Connection String Configuration for HA

Proper connection string configuration is critical for application resilience during failover events.

# Standard connection string with all replica set members
mongodb://appuser:password@mongo-0.mongo-svc:27017,mongo-1.mongo-svc:27017,mongo-2.mongo-svc:27017/appdb?replicaSet=rs-production&w=majority&j=true&readPreference=secondaryPreferred&retryWrites=true&retryReads=true&connectTimeoutMS=10000&socketTimeoutMS=30000&serverSelectionTimeoutMS=15000&maxPoolSize=100&minPoolSize=10

# SRV-based connection string (DNS discovery)
mongodb+srv://appuser:password@mongo-cluster.databases.svc.cluster.local/appdb?w=majority&retryWrites=true&readPreference=nearest

# For sharded clusters (connect through mongos)
mongodb://appuser:password@mongos-0:27017,mongos-1:27017/appdb?w=majority&retryWrites=true&readPreference=nearest

Index Optimization and Query Performance

Indexes are the primary lever for MongoDB query performance. A missing index on a frequently queried field forces a collection scan that degrades linearly with data size.

# Create compound index for common query pattern
db.orders.createIndex(
  { customer_id: 1, order_date: -1, status: 1 },
  { name: "idx_customer_orders", background: true }
)

# Partial index (only index documents matching a filter)
db.events.createIndex(
  { timestamp: 1 },
  { name: "idx_active_events", partialFilterExpression: { status: "active" } }
)

# TTL index for automatic document expiration
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 86400, name: "idx_session_ttl" }
)

# Text index for search
db.products.createIndex(
  { name: "text", description: "text" },
  { weights: { name: 10, description: 5 }, name: "idx_product_search" }
)

# Analyze query performance
db.orders.find({ customer_id: "c123" }).sort({ order_date: -1 }).explain("executionStats")

# Find unused indexes
db.orders.aggregate([ { $indexStats: {} } ])

WiredTiger Storage Engine Tuning

WiredTiger is MongoDB's default and only production storage engine since MongoDB 4.2. Its performance characteristics are heavily influenced by cache sizing, compression, and journal configuration.

# WiredTiger configuration in mongod.conf
storage:
  dbPath: /data/db
  journal:
    enabled: true
    commitIntervalMs: 100
  wiredTiger:
    engineConfig:
      cacheSizeGB: 4            # ~50% of (RAM - 1GB), max 80%
      journalCompressor: snappy
      directoryForIndexes: true  # separate dir for index files
    collectionConfig:
      blockCompressor: snappy   # or zstd for better ratio
    indexConfig:
      prefixCompression: true

operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 100

replication:
  oplogSizeMB: 51200            # 50 GB oplog
  replSetName: rs-production

net:
  maxIncomingConnections: 10000
  compression:
    compressors: snappy,zstd,zlib

setParameter:
  wiredTigerConcurrentReadTransactions: 128
  wiredTigerConcurrentWriteTransactions: 128

Authentication and TLS Encryption

# Generate TLS certificates for MongoDB
# CA certificate
openssl req -x509 -newkey rsa:4096 -days 3650 -nodes \
  -keyout ca.key -out ca.crt \
  -subj "/CN=MongoDB-CA"

# Server certificate (include all member hostnames in SAN)
openssl req -newkey rsa:4096 -nodes \
  -keyout server.key -out server.csr \
  -subj "/CN=mongo-0.mongo-svc.databases.svc.cluster.local" \
  -addext "subjectAltName=DNS:mongo-0.mongo-svc.databases.svc.cluster.local,DNS:mongo-1.mongo-svc.databases.svc.cluster.local,DNS:mongo-2.mongo-svc.databases.svc.cluster.local,DNS:localhost,IP:127.0.0.1"

openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
  -CAcreateserial -out server.crt -days 365

# Combine cert and key into PEM
cat server.crt server.key > server.pem

# Create Kubernetes secrets
kubectl create secret tls mongodb-tls-cert \
  --cert=server.crt --key=server.key -n databases

kubectl create secret generic mongodb-ca-cert \
  --from-file=ca.crt=ca.crt -n databases

# MongoDB TLS configuration (mongod.conf)
net:
  tls:
    mode: requireTLS
    certificateKeyFile: /etc/mongodb/tls/server.pem
    CAFile: /etc/mongodb/tls/ca.crt
    allowConnectionsWithoutCertificates: false

security:
  authorization: enabled
  clusterAuthMode: x509

Monitoring with Prometheus and Grafana

# Deploy mongodb_exporter as a sidecar or standalone
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mongodb-exporter
  namespace: databases
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mongodb-exporter
  template:
    metadata:
      labels:
        app: mongodb-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9216"
    spec:
      containers:
        - name: exporter
          image: percona/mongodb_exporter:0.40.0
          args:
            - --mongodb.uri=mongodb://monitoring:password@production-mongodb-0.production-mongodb-svc:27017,production-mongodb-1.production-mongodb-svc:27017,production-mongodb-2.production-mongodb-svc:27017/?replicaSet=rs-production&authSource=admin
            - --collect-all
            - --compatible-mode
          ports:
            - containerPort: 9216
          resources:
            requests:
              cpu: 100m
              memory: 128Mi

# ServiceMonitor for Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mongodb-metrics
  namespace: databases
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: mongodb-exporter
  endpoints:
    - port: metrics
      interval: 15s
      scrapeTimeout: 10s

# Critical Prometheus alert rules for MongoDB
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: mongodb-alerts
  namespace: databases
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: mongodb-health
      rules:
        - alert: MongoDBReplicationLagHigh
          expr: mongodb_mongod_replset_member_replication_lag > 30
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "MongoDB replica {{ $labels.name }} lag exceeds 30s"

        - alert: MongoDBConnectionsHigh
          expr: mongodb_connections{state="current"} / mongodb_connections{state="available"} > 0.8
          for: 2m
          labels:
            severity: critical
          annotations:
            summary: "MongoDB connections above 80% capacity"

        - alert: MongoDBWiredTigerCacheEvictions
          expr: rate(mongodb_wiredtiger_cache_evicted_pages_total[5m]) > 100
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "High WiredTiger cache eviction rate"

        - alert: MongoDBReplicaSetNoPrimary
          expr: mongodb_mongod_replset_number_of_members{state="PRIMARY"} == 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: "MongoDB replica set has no primary"

        - alert: MongoDBQueryTargetingInefficient
          expr: rate(mongodb_mongod_metrics_query_executor_total{state="scanned_objects"}[5m]) / rate(mongodb_mongod_metrics_query_executor_total{state="returned"}[5m]) > 100
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "MongoDB scanning 100x more documents than returned"

Change Streams for Real-Time Applications

// Watch changes on a collection
const pipeline = [
  { $match: { operationType: { $in: ["insert", "update", "replace"] } } },
  { $match: { "fullDocument.status": "active" } }
];

const changeStream = db.collection("orders").watch(pipeline, {
  fullDocument: "updateLookup",  // include full document on updates
  resumeAfter: resumeToken       // resume from last processed event
});

changeStream.on("change", (change) => {
  console.log("Change detected:", change.operationType);
  console.log("Document:", change.fullDocument);
  // Store resume token for crash recovery
  saveResumeToken(change._id);
});

changeStream.on("error", (error) => {
  console.error("Change stream error:", error);
  // Reconnect using saved resume token
});

Rolling Maintenance and Version Upgrades

# Rolling upgrade procedure for self-managed replica set
# 1. Upgrade each secondary one at a time
# On secondary (mongo-2):
sudo systemctl stop mongod
sudo yum install -y mongodb-org-7.0.14  # or apt-get
sudo systemctl start mongod
# Wait for the member to reach SECONDARY state
rs.status()

# 2. Step down the primary
rs.stepDown()

# 3. Upgrade the old primary (now a secondary)
sudo systemctl stop mongod
sudo yum install -y mongodb-org-7.0.14
sudo systemctl start mongod

# 4. Verify cluster health
rs.status()
db.adminCommand({ getParameter: 1, featureCompatibilityVersion: 1 })

# 5. Set feature compatibility version (irreversible)
db.adminCommand({ setFeatureCompatibilityVersion: "7.0" })

Disaster Recovery and Failover Testing

Controlled Failover Tests

# Test 1: Step down the primary
rs.stepDown(120)  // 120-second election hold
// Monitor: election should complete in ~10-12s
// Verify: application reconnects and resumes operations

# Test 2: Kill a secondary pod in Kubernetes
kubectl delete pod production-mongodb-1 -n databases --grace-period=0 --force
// The StatefulSet recreates the pod
// The operator rejoins it to the replica set

# Test 3: Simulate network partition
kubectl exec production-mongodb-0 -n databases -- \
  iptables -A INPUT -p tcp --dport 27017 -j DROP
// The isolated primary should step down (cannot reach majority)
// Remaining members elect a new primary
// Cleanup: remove iptables rule and let member rejoin

# Test 4: Simulate storage failure
kubectl exec production-mongodb-2 -n databases -- \
  chmod 000 /data/db
// mongod should crash; Kubernetes restarts the pod
// The member resyncs from the primary's oplog

What to Measure During Failover

RTO (Recovery Time Objective) — Time from primary failure to new primary accepting writes. Target: under 30 seconds for replica sets.
RPO (Recovery Point Objective) — Data loss during failover. With w: majority, RPO is zero for acknowledged writes. With w: 1, RPO equals the replication lag at the time of failure.
Application error rate — Percentage of requests that fail during the failover window. With retryWrites=true, most write failures are automatically retried by the driver.
Change stream continuity — Verify that change stream consumers resume from their saved token without missing events.

Disaster Recovery Runbook

Single member failure — Automatic recovery via Kubernetes pod restart and oplog catch-up. No manual action required unless the member needs a full resync.
Primary failure — Automatic election promotes a secondary within 10-12 seconds. Verify application connectivity and replication lag on remaining secondaries.
Majority failure — If a majority of members are down, the replica set becomes read-only (no elections possible). Restore members or use rs.reconfig({ force: true }) as a last resort (this can cause data loss).
Complete cluster loss — Deploy a new cluster, restore from the latest PBM backup, and replay oplog to the target point in time. Update connection strings and DNS records.
Regional failover — If the primary region is lost, a secondary in another region is automatically elected primary (if it has sufficient priority and the remaining members form a majority). Update DNS to route traffic to the new primary region.

Production Tuning Recommendations

OS-Level Tuning

# Disable Transparent Huge Pages (critical for MongoDB)
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

# Set readahead to 8-32 sectors for SSD
blockdev --setra 32 /dev/sda

# Increase file descriptor limits
ulimit -n 64000
ulimit -u 64000

# Swappiness
vm.swappiness = 1

# Dirty page ratio
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5

# Network tuning
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_time = 120

Resource Sizing Guidelines

WiredTiger cache: 50% of available RAM minus 1 GB, or the size of your working set, whichever is smaller.
Oplog size: Enough to hold 24-72 hours of write operations. Start with 50 GB and monitor with rs.printReplicationInfo().
Storage IOPS: MongoDB is I/O-intensive, especially during compaction and checkpointing. Use NVMe SSD or cloud storage with provisioned IOPS (gp3 with 6000+ IOPS on AWS, Premium SSD v2 on Azure, pd-ssd on GCP).
CPU: MongoDB benefits from multiple cores for concurrent read/write operations, WiredTiger's concurrent transactions, and background tasks (checkpointing, compaction, replication).
Network: Replication traffic can be significant for write-heavy workloads. Ensure low-latency, high-bandwidth networking between replica set members.

Connection Management

# Application-side connection pool configuration
const client = new MongoClient(uri, {
  maxPoolSize: 100,
  minPoolSize: 10,
  maxIdleTimeMS: 60000,
  waitQueueTimeoutMS: 5000,
  connectTimeoutMS: 10000,
  socketTimeoutMS: 30000,
  serverSelectionTimeoutMS: 15000,
  retryWrites: true,
  retryReads: true,
  w: "majority",
  readPreference: "secondaryPreferred",
  compressors: ["snappy", "zstd"]
});

Monitoring Checklist

Replication lag — Alert when any secondary exceeds 30 seconds of lag.
Connection saturation — Alert when current connections exceed 80% of maxIncomingConnections.
WiredTiger cache — Alert when the cache dirty fill ratio exceeds 20% (indicates write pressure exceeding checkpoint throughput).
Oplog window — Alert when the oplog window drops below 12 hours (risk of secondaries needing full resync after maintenance).
Query targeting — Alert when the ratio of scanned documents to returned documents exceeds 100 (missing index).
Disk usage — Alert at 70% and 85% thresholds. MongoDB can use significant temporary disk space during compaction.
Backup freshness — Alert when the last successful backup is older than your RPO window.
Ticket availability — Monitor WiredTiger read and write tickets. Exhaustion causes operation queuing and latency spikes.

Operational Commands Quick Reference

# Replica set status
rs.status()
rs.conf()
rs.printReplicationInfo()
rs.printSecondaryReplicationInfo()

# Cluster health (sharded)
sh.status()
db.adminCommand({ balancerStatus: 1 })
db.adminCommand({ listShards: 1 })

# Server diagnostics
db.serverStatus()
db.currentOp({ "$all": true })
db.adminCommand({ hostInfo: 1 })

# Kill long-running operations
db.killOp(opId)

# Compaction (reclaim disk space)
db.runCommand({ compact: "orders" })

# Profiler (identify slow queries)
db.setProfilingLevel(1, { slowms: 100 })
db.system.profile.find().sort({ ts: -1 }).limit(10)

# Index management
db.orders.getIndexes()
db.orders.createIndex({ field: 1 }, { background: true })
db.orders.dropIndex("index_name")

# Kubernetes-specific
kubectl get mongodbcommunity -n databases
kubectl describe mongodbcommunity production-mongodb -n databases
kubectl logs production-mongodb-0 -n databases -c mongod
kubectl exec -it production-mongodb-0 -n databases -c mongod -- mongosh

Decision Matrix: Choosing Your MongoDB HA Architecture

Single cloud, managed preference — Use MongoDB Atlas. It handles HA, backups, monitoring, and scaling with minimal operational overhead.
AWS with MongoDB-compatible needs only — Consider Amazon DocumentDB if your application uses a basic subset of the MongoDB API and you want a fully managed experience. Otherwise, Atlas or self-managed on EKS.
Azure with full MongoDB features — Use Cosmos DB for MongoDB vCore for a managed experience, or Atlas on Azure for the genuine MongoDB managed service.
Multi-cloud or hybrid — Self-managed with the MongoDB Community Operator or Percona Operator. The Kubernetes abstraction enables consistent deployment across providers.
Bare metal or edge — k3s + Rancher + Longhorn + MongoDB Community Operator or Percona Operator. No cloud dependencies required.
Large-scale horizontal scaling — Sharded cluster with zone sharding for data locality. Use the Percona Operator which supports sharded deployments natively.
Real-time event-driven applications — MongoDB change streams provide built-in CDC. Ensure you use a replica set or sharded cluster (not standalone).

MongoDB High Availability in Production: Replica Sets, Sharding, and Kubernetes Operators

MongoDB Replica Set Architecture

Oplog and Replication Mechanics

Elections and the Raft-Based Protocol

Read Preference and Write Concern

MongoDB Sharded Cluster Architecture

Shard Key Selection

Zone Sharding for Multi-Region

Multi-Region MongoDB Deployment

MongoDB Community Kubernetes Operator

MongoDBCommunity CRD Specification

Percona Server for MongoDB Operator

AWS Deployment: DocumentDB vs Atlas vs Self-Managed on EKS

Azure Deployment: Cosmos DB vs Atlas vs Self-Managed on AKS

GCP Deployment: Atlas on GCP vs Self-Managed on GKE

Bare Metal k3s/Rancher with Longhorn

Longhorn Storage for MongoDB

MetalLB and Headless Services

Backup Strategies

mongodump / mongorestore

Percona Backup for MongoDB (PBM)

Cloud Snapshots

Oplog-Based Point-in-Time Recovery

Connection String Configuration for HA

Index Optimization and Query Performance

WiredTiger Storage Engine Tuning

Authentication and TLS Encryption

Monitoring with Prometheus and Grafana

Change Streams for Real-Time Applications

Rolling Maintenance and Version Upgrades

Disaster Recovery and Failover Testing

Controlled Failover Tests

What to Measure During Failover

Disaster Recovery Runbook

Production Tuning Recommendations

OS-Level Tuning

Resource Sizing Guidelines

Connection Management

Monitoring Checklist

Operational Commands Quick Reference

Decision Matrix: Choosing Your MongoDB HA Architecture

Conclusion

MongoDB High Availability in Production: Replica Sets, Sharding, and Kubernetes Operators

MongoDB Replica Set Architecture

Oplog and Replication Mechanics

Elections and the Raft-Based Protocol

Read Preference and Write Concern

MongoDB Sharded Cluster Architecture

Shard Key Selection

Zone Sharding for Multi-Region

Multi-Region MongoDB Deployment

MongoDB Community Kubernetes Operator

MongoDBCommunity CRD Specification

Percona Server for MongoDB Operator

AWS Deployment: DocumentDB vs Atlas vs Self-Managed on EKS

Azure Deployment: Cosmos DB vs Atlas vs Self-Managed on AKS

GCP Deployment: Atlas on GCP vs Self-Managed on GKE

Bare Metal k3s/Rancher with Longhorn

Longhorn Storage for MongoDB

MetalLB and Headless Services

Backup Strategies

mongodump / mongorestore

Percona Backup for MongoDB (PBM)

Cloud Snapshots

Oplog-Based Point-in-Time Recovery

Connection String Configuration for HA

Index Optimization and Query Performance

WiredTiger Storage Engine Tuning

Authentication and TLS Encryption

Monitoring with Prometheus and Grafana

Change Streams for Real-Time Applications

Rolling Maintenance and Version Upgrades

Disaster Recovery and Failover Testing

Controlled Failover Tests

What to Measure During Failover

Disaster Recovery Runbook

Production Tuning Recommendations

OS-Level Tuning

Resource Sizing Guidelines

Connection Management