MySQL High Availability in Production: InnoDB Cluster, Group Replication, and Kubernetes Operators
A production engineer's guide to InnoDB Cluster, Group Replication, MySQL Operator for Kubernetes, Percona XtraDB Cluster, multi-cloud strategies, and bare metal k3s/Rancher deployments with Longhorn storage
Running MySQL in production is familiar territory for most engineering teams. Running it with genuine high availability — where a node failure, a network partition, or an entire availability zone going dark does not result in downtime or data loss — requires deliberate architecture. This guide walks through every layer of that architecture: from the replication primitives inside MySQL itself, through the Kubernetes operators that automate lifecycle management, across the managed and self-managed options on every major cloud provider, and down to bare metal k3s clusters managed by Rancher.
By the end of this article you will have a complete mental model for choosing and operating MySQL HA in production, along with concrete configuration examples you can adapt to your own environment.
MySQL InnoDB Cluster Architecture
InnoDB Cluster is Oracle's integrated high availability solution for MySQL. It combines three components: MySQL Group Replication for data synchronisation, MySQL Shell for cluster administration, and MySQL Router for transparent connection routing and automatic failover. Together they form a self-healing cluster that can tolerate node failures without manual intervention.
The architecture is elegant in its simplicity. Applications connect to MySQL Router, which maintains awareness of the cluster topology by querying the InnoDB Cluster metadata schema. When the primary fails, Group Replication elects a new primary from the remaining secondaries, and MySQL Router automatically redirects write traffic to the new primary — typically within seconds. Read traffic can be distributed across all secondaries for horizontal read scaling.
Group Replication Fundamentals
MySQL Group Replication is the foundation of InnoDB Cluster. It uses a Paxos-based consensus protocol to ensure that every transaction committed on the primary is replicated to a majority of nodes before being acknowledged. This provides virtual synchronous replication — a guarantee that committed data exists on at least a majority of cluster members at the time of commit.
Group Replication operates in two modes:
- Single-Primary Mode — One node accepts writes (the primary); all others are read-only secondaries. This is the recommended and default mode. It avoids write conflicts entirely because only one node can generate transactions.
- Multi-Primary Mode — All nodes accept writes simultaneously. This offers higher write throughput for workloads that partition cleanly across different tables or keyspaces, but it introduces the possibility of certification conflicts when concurrent transactions modify the same rows. Conflicting transactions are rolled back on one node. Use multi-primary only when your application is designed to handle certification failures and retry logic.
Setting Up InnoDB Cluster with MySQL Shell
MySQL Shell provides the AdminAPI, a set of functions that automate the entire cluster lifecycle. Here is the complete setup sequence for a 3-node cluster.
# Step 1: Prepare each MySQL instance (run on all 3 nodes)
# Ensure my.cnf has required settings
[mysqld]
server-id=1 # unique per node: 1, 2, 3
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_checksum=NONE
binlog_transaction_dependency_tracking=WRITESET
transaction_write_set_extraction=XXHASH64
loose-group_replication_start_on_boot=OFF
plugin_load_add='group_replication.so'
plugin_load_add='mysql_clone.so'
report_host='mysql-node-1' # unique per node
# Step 2: Use MySQL Shell to configure and create the cluster
mysqlsh -- dba configure-instance root@mysql-node-1:3306 --clusterAdmin=gradmin --clusterAdminPassword='SecurePass123!'
mysqlsh -- dba configure-instance root@mysql-node-2:3306 --clusterAdmin=gradmin --clusterAdminPassword='SecurePass123!'
mysqlsh -- dba configure-instance root@mysql-node-3:3306 --clusterAdmin=gradmin --clusterAdminPassword='SecurePass123!'
# Step 3: Connect to the first node and create the cluster
mysqlsh gradmin@mysql-node-1:3306
# Inside MySQL Shell:
var cluster = dba.createCluster('productionCluster', {
memberWeight: 90,
exitStateAction: 'ABORT_SERVER',
consistency: 'BEFORE_ON_PRIMARY_FAILOVER',
expelTimeout: 10
})
# Step 4: Add remaining nodes
cluster.addInstance('gradmin@mysql-node-2:3306', {recoveryMethod: 'clone'})
cluster.addInstance('gradmin@mysql-node-3:3306', {recoveryMethod: 'clone'})
# Step 5: Verify cluster status
cluster.status()
The recoveryMethod: 'clone' option uses the MySQL Clone Plugin to perform a full data copy from the primary to the joining member, which is far faster than incremental recovery from binary logs for large datasets.
MySQL Router Configuration
MySQL Router is bootstrapped against the cluster and automatically generates its configuration file.
# Bootstrap MySQL Router against the cluster
mysqlrouter --bootstrap gradmin@mysql-node-1:3306 \
--directory /opt/mysqlrouter \
--conf-use-sockets \
--user=mysqlrouter \
--name='production-router'
# Start MySQL Router
/opt/mysqlrouter/start.sh
# Default ports after bootstrap:
# 6446 — R/W (routes to primary)
# 6447 — R/O (round-robin across secondaries)
# 6448 — R/W (X Protocol)
# 6449 — R/O (X Protocol)
Applications connect to the router's R/W port for writes and the R/O port for read replicas. When the primary fails, the router detects the topology change and re-routes within seconds.
Multi-Region MySQL Deployment
For disaster recovery and low-latency reads across geographies, MySQL can be deployed across multiple regions. InnoDB ClusterSet extends InnoDB Cluster to support asynchronous replication between a primary cluster and one or more replica clusters in different regions.
InnoDB ClusterSet operates with one primary cluster that handles all writes, and one or more replica clusters that receive changes asynchronously. In a disaster scenario, a replica cluster can be promoted to primary via MySQL Shell.
# Create a ClusterSet from an existing InnoDB Cluster
mysqlsh gradmin@mysql-node-1:3306
var cluster = dba.getCluster()
var cs = cluster.createClusterSet('globalSet')
# Create a replica cluster in eu-west region
cs.createReplicaCluster('gradmin@mysql-eu-node-1:3306', 'euWestCluster', {
recoveryMethod: 'clone'
})
var euCluster = cs.getCluster('euWestCluster')
euCluster.addInstance('gradmin@mysql-eu-node-2:3306', {recoveryMethod: 'clone'})
euCluster.addInstance('gradmin@mysql-eu-node-3:3306', {recoveryMethod: 'clone'})
# Emergency failover (when primary cluster is unreachable)
cs.forcePrimaryCluster('euWestCluster')
MySQL on Kubernetes — Operator Pattern
Running MySQL on Kubernetes requires solving several challenges that don't apply to stateless workloads: stable network identities, persistent storage, ordered startup and shutdown, and cluster-aware health checks. Kubernetes operators encode this domain-specific operational knowledge into a controller that watches Custom Resources and reconciles the actual state of MySQL toward the desired state declared in YAML.
MySQL Operator for Kubernetes (Oracle)
Oracle's MySQL Operator for Kubernetes deploys and manages InnoDB Cluster instances natively on Kubernetes. It creates StatefulSets for MySQL server pods, Deployments for MySQL Router, and handles automated failover, scaling, backup, and configuration changes.
# Install the MySQL Operator via Helm
helm repo add mysql-operator https://mysql.github.io/mysql-operator/
helm repo update
helm install mysql-operator mysql-operator/mysql-operator \
--namespace mysql-operator \
--create-namespace \
--set image.pullPolicy=IfNotPresent
Once the operator is running, deploy an InnoDB Cluster by creating a custom resource.
apiVersion: v1
kind: Secret
metadata:
name: mysql-root-credentials
namespace: production
stringData:
rootUser: root
rootHost: '%'
rootPassword: 'ProductionSecurePass123!'
---
apiVersion: mysql.oracle.com/v2
kind: InnoDBCluster
metadata:
name: production-mysql
namespace: production
spec:
secretName: mysql-root-credentials
instances: 3
tlsUseSelfSigned: true
router:
instances: 2
datadirVolumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: gp3-encrypted
mycnf: |
[mysqld]
innodb_buffer_pool_size=4G
innodb_log_file_size=1G
innodb_flush_log_at_trx_commit=1
sync_binlog=1
max_connections=500
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_read_io_threads=8
innodb_write_io_threads=8
performance_schema=ON
slow_query_log=ON
long_query_time=1
podSpec:
containers:
- name: mysql
resources:
requests:
cpu: "2"
memory: 8Gi
limits:
cpu: "4"
memory: 16Gi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: component
operator: In
values: [mysqld]
topologyKey: topology.kubernetes.io/zone
The pod anti-affinity rule ensures MySQL pods are spread across availability zones, providing zone-level fault tolerance. The mycnf block allows you to inject production-tuned MySQL configuration directly through the CR.
Percona Operator for MySQL (PXC)
Percona Operator for MySQL deploys Percona XtraDB Cluster (PXC), a multi-primary synchronous replication solution based on Galera. PXC differs from InnoDB Cluster in that every node can accept writes (true multi-primary), and replication is synchronous at the certification level using the wsrep API.
# Install Percona Operator
helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update
helm install pxc-operator percona/pxc-operator \
--namespace pxc \
--create-namespace
# Percona XtraDB Cluster Custom Resource
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
name: production-pxc
namespace: pxc
spec:
crVersion: '1.14.0'
secretsName: pxc-secrets
pxc:
size: 3
image: percona/percona-xtradb-cluster:8.0.35
resources:
requests:
memory: 8Gi
cpu: "2"
limits:
memory: 16Gi
cpu: "4"
volumeSpec:
persistentVolumeClaim:
storageClassName: gp3-encrypted
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 200Gi
affinity:
antiAffinityTopologyKey: topology.kubernetes.io/zone
configuration: |
[mysqld]
innodb_buffer_pool_size=4G
innodb_flush_log_at_trx_commit=1
wsrep_provider_options="gcache.size=2G; gcs.fc_limit=256"
wsrep_slave_threads=8
wsrep_certify_nonPK=1
wsrep_trx_fragment_size=10M
max_connections=500
haproxy:
enabled: true
size: 3
image: percona/haproxy:2.8.5
resources:
requests:
memory: 1Gi
cpu: 500m
proxysql:
enabled: false
backup:
image: percona/percona-xtradb-cluster-operator:1.14.0-pxc8.0-backup-pxb8.0.35
storages:
s3-backup:
type: s3
verifyTLS: true
s3:
bucket: production-mysql-backups
region: us-east-1
credentialsSecret: aws-s3-credentials
schedule:
- name: daily-full
schedule: "0 3 * * *"
keep: 7
storageName: s3-backup
- name: hourly-incremental
schedule: "0 * * * *"
keep: 24
storageName: s3-backup
The Percona Operator handles automated backups to S3, point-in-time recovery, rolling upgrades, and ProxySQL or HAProxy for connection routing. The Galera-based replication in PXC provides true synchronous multi-primary writes — every committed transaction is guaranteed to exist on all nodes.
Connection Routing: MySQL Router vs ProxySQL
Both MySQL Router and ProxySQL serve as database proxies, but they have different strengths.
MySQL Router is purpose-built for InnoDB Cluster. It reads the cluster metadata, tracks topology changes, and routes connections to the correct primary or secondary. Its configuration is minimal and it integrates seamlessly with the MySQL ecosystem. The downside is limited query-level routing — it operates at the connection level, not the query level.
ProxySQL is a general-purpose MySQL proxy with advanced features: query-level read/write splitting, query caching, connection multiplexing, query rewriting, and sophisticated routing rules. It excels in environments where you need fine-grained control over how queries are distributed.
# ProxySQL configuration for read/write splitting
# proxysql.cnf
mysql_servers:
(
{ hostgroup_id=10, hostname="mysql-primary", port=3306, max_connections=200, weight=1000 },
{ hostgroup_id=20, hostname="mysql-secondary-1", port=3306, max_connections=200, weight=500 },
{ hostgroup_id=20, hostname="mysql-secondary-2", port=3306, max_connections=200, weight=500 }
)
mysql_query_rules:
(
{ rule_id=1, active=1, match_digest="^SELECT .* FOR UPDATE$", destination_hostgroup=10, apply=1 },
{ rule_id=2, active=1, match_digest="^SELECT", destination_hostgroup=20, apply=1 },
{ rule_id=3, active=1, match_digest=".*", destination_hostgroup=10, apply=1 }
)
mysql_replication_hostgroups:
(
{ writer_hostgroup=10, reader_hostgroup=20, comment="InnoDB Cluster" }
)
In Kubernetes environments, ProxySQL can run as a sidecar container in your application pods or as a dedicated deployment. Running it as a sidecar eliminates network hops but increases pod resource usage; a dedicated deployment is easier to manage at scale.
Cloud Provider Comparison: Managed vs Self-Managed
AWS: RDS Multi-AZ vs Aurora vs Self-Managed on EKS
Amazon RDS Multi-AZ provides automatic failover between a primary and a standby instance in different availability zones. The failover typically takes 60–120 seconds. It uses synchronous physical replication to the standby. Read replicas can be added for read scaling but they use asynchronous replication. RDS handles backups, patching, and monitoring but limits your control over MySQL configuration and version choice.
Amazon Aurora MySQL is a cloud-native rewrite of the MySQL storage engine. It separates compute from storage — the storage layer is a distributed, fault-tolerant system that replicates data six ways across three AZs. Aurora provides sub-10-second failover, up to 15 read replicas with minimal replication lag, and automatic storage scaling up to 128 TiB. Aurora Serverless v2 adds automatic compute scaling for unpredictable workloads. The trade-off is cost (Aurora is 20–40% more expensive than RDS) and reduced compatibility with certain MySQL features.
Self-Managed on EKS gives you full control over MySQL version, configuration, and replication topology. Use this when you need specific MySQL features not available in managed services, when you require multi-cloud portability, or when cost optimization at scale justifies the operational overhead.
# EKS StorageClass for MySQL with gp3 volumes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "6000"
throughput: "250"
encrypted: "true"
fsType: ext4
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true
Azure: Flexible Server vs Self-Managed on AKS
Azure Database for MySQL Flexible Server offers zone-redundant HA with automatic failover, same-region read replicas, and up to 16 TiB storage. It supports configurable maintenance windows, the ability to stop/start instances (useful for dev/test cost savings), and integration with Azure Private Link for network isolation. The Business Critical tier provides the best performance with local SSD storage.
Self-Managed on AKS uses Azure Managed Disks (Premium SSD v2 recommended for database workloads) with the MySQL Operator or Percona Operator. AKS provides availability zones support and the Azure CNI for pod-level VNET integration.
# AKS StorageClass for MySQL
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium-ssd-zrs
provisioner: disk.csi.azure.com
parameters:
skuName: Premium_ZRS
cachingMode: None
DiskIOPSReadWrite: "5000"
DiskMBpsReadWrite: "200"
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true
Google Cloud: Cloud SQL vs Self-Managed on GKE
Google Cloud SQL for MySQL offers regional instances with automatic failover across zones, read replicas (including cross-region), and automated backups with point-in-time recovery. Cloud SQL provides the highest level of managed convenience with integration into Google's IAM, VPC, and monitoring ecosystem. The Enterprise Plus tier adds near-zero downtime maintenance and data cache for improved read performance.
Self-Managed on GKE uses Persistent Disk SSD or Hyperdisk for storage. GKE Autopilot simplifies node management and can run the MySQL Operator with minimal operational overhead.
# GKE StorageClass for MySQL
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ssd-regional
provisioner: pd.csi.storage.gke.io
parameters:
type: pd-ssd
replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true
Bare Metal: k3s + Rancher + Longhorn
Not every workload belongs in the public cloud. For data sovereignty, compliance, cost optimisation, or latency requirements, bare metal Kubernetes clusters running k3s with Rancher management and Longhorn storage provide a production-grade platform for MySQL HA.
k3s Installation and Configuration
k3s is a lightweight, certified Kubernetes distribution ideal for edge and bare metal. It bundles all control-plane components into a single binary and uses SQLite or embedded etcd for state storage.
# Install k3s server (first control-plane node)
curl -sfL https://get.k3s.io | sh -s - server \
--cluster-init \
--tls-san 10.10.0.10 \
--disable traefik \
--disable servicelb \
--write-kubeconfig-mode 644
# Join additional server nodes
curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat /var/lib/rancher/k3s/server/node-token) sh -s - server \
--server https://10.10.0.10:6443 \
--tls-san 10.10.0.10
# Join worker nodes
curl -sfL https://get.k3s.io | K3S_URL=https://10.10.0.10:6443 \
K3S_TOKEN=$(cat /var/lib/rancher/k3s/server/node-token) sh -s -
Longhorn Storage for MySQL
Longhorn is a cloud-native distributed block storage system built for Kubernetes. It replicates data across nodes, provides snapshots and backups, and integrates natively with the Kubernetes CSI driver. For MySQL, Longhorn provides the persistent, replicated storage layer that cloud providers handle with their managed disk offerings.
# Install Longhorn
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--create-namespace \
--set defaultSettings.defaultReplicaCount=3 \
--set defaultSettings.storageMinimalAvailablePercentage=15 \
--set defaultSettings.guaranteedInstanceManagerCPU=12
# StorageClass for MySQL on Longhorn
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-mysql
provisioner: driver.longhorn.io
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
dataLocality: best-effort
diskSelector: ssd
fsType: ext4
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
MetalLB for Load Balancing
On bare metal, there is no cloud load balancer. MetalLB fills this gap by assigning external IP addresses to Kubernetes Services of type LoadBalancer using either Layer 2 (ARP) or BGP mode.
# MetalLB IP Address Pool and L2 Advertisement
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: mysql-pool
namespace: metallb-system
spec:
addresses:
- 10.10.0.100-10.10.0.110
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: mysql-l2
namespace: metallb-system
spec:
ipAddressPools:
- mysql-pool
Backup and Recovery Strategies
A high availability cluster protects against node failures. Backups protect against data corruption, accidental deletes, application bugs, and disaster recovery scenarios where the entire cluster is lost. You need both.
Logical Backups with mysqldump
mysqldump produces SQL-format backups that are portable and human-readable. For databases under 50 GB, it is the simplest option. For larger datasets, the locking and export time make it impractical for production use during business hours.
# Full logical backup with consistent snapshot
mysqldump --all-databases \
--single-transaction \
--routines \
--triggers \
--events \
--set-gtid-purged=ON \
--result-file=/backups/full-$(date +%Y%m%d-%H%M%S).sql
# Compressed backup piped to S3
mysqldump --all-databases --single-transaction | \
gzip | aws s3 cp - s3://mysql-backups/full-$(date +%Y%m%d).sql.gz
Physical Backups with Percona XtraBackup
Percona XtraBackup performs hot, non-blocking physical backups of InnoDB data. It copies InnoDB data files while tracking the redo log, then applies the redo log during the prepare phase to produce a consistent backup. It is dramatically faster than mysqldump for large datasets and supports incremental backups.
# Full physical backup
xtrabackup --backup \
--target-dir=/backups/full \
--user=backup_user \
--password=SecureBackupPass \
--parallel=4 \
--compress \
--compress-threads=4
# Incremental backup based on previous full
xtrabackup --backup \
--target-dir=/backups/incr-$(date +%H) \
--incremental-basedir=/backups/full \
--user=backup_user \
--password=SecureBackupPass
# Prepare (apply redo log) and restore
xtrabackup --prepare --target-dir=/backups/full
xtrabackup --prepare --target-dir=/backups/full \
--incremental-dir=/backups/incr-01
# Copy back to data directory
xtrabackup --copy-back --target-dir=/backups/full --datadir=/var/lib/mysql
chown -R mysql:mysql /var/lib/mysql
Binary Log (binlog) Point-in-Time Recovery
Binary logs record every data-modifying transaction. Combined with a full backup, they enable point-in-time recovery (PITR) — restoring to any specific moment, not just the time of the last backup.
# Enable binlog retention (my.cnf)
[mysqld]
log_bin=mysql-bin
binlog_expire_logs_seconds=604800 # 7 days
binlog_format=ROW
gtid_mode=ON
enforce_gtid_consistency=ON
# Restore from backup then replay binlogs to a specific point
mysqlbinlog --start-datetime="2026-04-12 08:00:00" \
--stop-datetime="2026-04-12 09:30:00" \
mysql-bin.000042 mysql-bin.000043 | mysql -u root -p
# Or replay to a specific GTID position
mysqlbinlog --include-gtids="3c2d4f9e-d17e-ec59-9029-6aafa8accef8:1-5000" \
mysql-bin.000042 | mysql -u root -p
Kubernetes Backup CronJob
In Kubernetes, backups should run as CronJobs rather than ad-hoc commands. This ensures backups are automated, monitored, and can be alerted on if they fail.
apiVersion: batch/v1
kind: CronJob
metadata:
name: mysql-backup
namespace: production
spec:
schedule: "0 2 * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 7
failedJobsHistoryLimit: 3
jobTemplate:
spec:
activeDeadlineSeconds: 7200
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: percona/percona-xtrabackup:8.0.35
command:
- /bin/sh
- -c
- |
BACKUP_DIR=/backups/$(date +%Y%m%d-%H%M%S)
mkdir -p $BACKUP_DIR
xtrabackup --backup \
--host=production-mysql.production.svc \
--user=backup_user \
--password=$BACKUP_PASSWORD \
--target-dir=$BACKUP_DIR \
--parallel=4 \
--compress
xtrabackup --prepare --target-dir=$BACKUP_DIR
# Upload to S3
aws s3 sync $BACKUP_DIR s3://mysql-backups/$(date +%Y%m%d)/
# Cleanup local
rm -rf $BACKUP_DIR
env:
- name: BACKUP_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-backup-credentials
key: password
volumeMounts:
- name: backup-scratch
mountPath: /backups
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "2"
memory: 4Gi
volumes:
- name: backup-scratch
emptyDir:
sizeLimit: 100Gi
Monitoring with Percona Monitoring and Management (PMM)
Percona Monitoring and Management (PMM) is an open-source monitoring platform purpose-built for database observability. While Prometheus and Grafana provide general Kubernetes monitoring, PMM adds MySQL-specific insights: query analytics (QAN) that identifies slow queries, replication lag dashboards, InnoDB buffer pool hit ratios, table lock contention, and more.
# Deploy PMM Server on Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: pmm-server
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: pmm-server
template:
metadata:
labels:
app: pmm-server
spec:
containers:
- name: pmm-server
image: percona/pmm-server:2
ports:
- containerPort: 443
env:
- name: DISABLE_TELEMETRY
value: "1"
volumeMounts:
- name: pmm-data
mountPath: /srv
resources:
requests:
cpu: "1"
memory: 4Gi
limits:
cpu: "2"
memory: 8Gi
volumes:
- name: pmm-data
persistentVolumeClaim:
claimName: pmm-data-pvc
---
apiVersion: v1
kind: Service
metadata:
name: pmm-server
namespace: monitoring
spec:
type: ClusterIP
selector:
app: pmm-server
ports:
- port: 443
targetPort: 443
# Register MySQL instances with PMM Client (run on each MySQL pod or sidecar)
pmm-admin config --server-url=https://admin:password@pmm-server.monitoring:443 --server-insecure-tls
pmm-admin add mysql \
--username=pmm_monitor \
--password=MonitorPass123 \
--host=127.0.0.1 \
--port=3306 \
--query-source=perfschema \
--service-name=mysql-node-1
PMM's Query Analytics is particularly valuable in production. It captures every query executed on the server (via Performance Schema or slow query log), aggregates them by fingerprint, and shows metrics like average latency, rows examined vs rows sent, and lock time. This is how you identify the queries that are degrading your application's performance.
Production Configuration Tuning
The default MySQL configuration is tuned for a small, general-purpose workload. Production environments with dedicated database servers need significantly different settings. Here are the critical parameters and how to size them.
InnoDB Buffer Pool
The InnoDB buffer pool is where MySQL caches table and index data in memory. It is the single most impactful configuration parameter. For a dedicated MySQL server, set it to 70–80% of available RAM.
[mysqld]
# Memory: 16Gi container limit -> allocate ~12Gi to buffer pool
innodb_buffer_pool_size = 12G
innodb_buffer_pool_instances = 8 # 1 per GB (up to 64)
innodb_buffer_pool_chunk_size = 1536M # pool_size / instances
Redo Log Configuration
The redo log absorbs writes before they are flushed to data files. Larger redo logs reduce the frequency of checkpoint flushes and improve write throughput. MySQL 8.0.30+ uses innodb_redo_log_capacity instead of the older innodb_log_file_size.
[mysqld]
# MySQL 8.0.30+
innodb_redo_log_capacity = 4G
# MySQL 8.0.29 and earlier
# innodb_log_file_size = 2G
# innodb_log_files_in_group = 2
Durability vs Performance Trade-off
The combination of innodb_flush_log_at_trx_commit and sync_binlog determines your durability guarantee.
- Maximum durability (recommended for production):
innodb_flush_log_at_trx_commit=1+sync_binlog=1. Every transaction is flushed to the redo log and binary log on disk before being acknowledged. Zero data loss on crash. - Balanced:
innodb_flush_log_at_trx_commit=2+sync_binlog=1. Redo log is written to the OS cache per commit but only flushed to disk once per second. Up to 1 second of transactions can be lost on an OS crash (MySQL crash is still safe). - Maximum performance (not recommended for production):
innodb_flush_log_at_trx_commit=0+sync_binlog=0. Writes are batched and flushed periodically. Risk of losing up to 1 second of transactions on any crash.
[mysqld]
# Production durability settings
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
innodb_doublewrite = ON
innodb_checksum_algorithm = crc32
I/O Configuration
[mysqld]
# SSD-optimised I/O settings
innodb_io_capacity = 4000
innodb_io_capacity_max = 8000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_flush_method = O_DIRECT
innodb_file_per_table = ON
innodb_open_files = 10000
Connection and Thread Management
[mysqld]
max_connections = 500
max_connect_errors = 1000000
thread_cache_size = 64
table_open_cache = 4096
table_open_cache_instances = 16
back_log = 1500
# Thread pool (MySQL Enterprise or Percona)
thread_pool_size = 16
thread_pool_max_threads = 1000
Temporary Tables and Sort Buffers
[mysqld]
tmp_table_size = 64M
max_heap_table_size = 64M
sort_buffer_size = 4M
join_buffer_size = 4M
read_rnd_buffer_size = 2M
Complete Production Configuration Template
[mysqld]
# Identity
server-id = 1
report_host = mysql-node-1
# GTID Replication
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_format = ROW
log_bin = mysql-bin
binlog_expire_logs_seconds = 604800
log_slave_updates = ON
relay_log_recovery = ON
# InnoDB Engine
innodb_buffer_pool_size = 12G
innodb_buffer_pool_instances = 8
innodb_redo_log_capacity = 4G
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
innodb_doublewrite = ON
innodb_flush_method = O_DIRECT
innodb_io_capacity = 4000
innodb_io_capacity_max = 8000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_file_per_table = ON
innodb_open_files = 10000
innodb_adaptive_hash_index = ON
innodb_change_buffering = all
# Connections
max_connections = 500
max_connect_errors = 1000000
thread_cache_size = 64
table_open_cache = 4096
table_open_cache_instances = 16
back_log = 1500
# Temp / Sort
tmp_table_size = 64M
max_heap_table_size = 64M
sort_buffer_size = 4M
join_buffer_size = 4M
# Monitoring
performance_schema = ON
slow_query_log = ON
long_query_time = 1
log_queries_not_using_indexes = ON
log_throttle_queries_not_using_indexes = 60
# Security
local_infile = OFF
skip_name_resolve = ON
default_authentication_plugin = caching_sha2_password
# Group Replication (InnoDB Cluster)
plugin_load_add = 'group_replication.so'
plugin_load_add = 'mysql_clone.so'
loose-group_replication_start_on_boot = OFF
loose-group_replication_bootstrap_group = OFF
loose-group_replication_recovery_use_ssl = ON
loose-group_replication_ssl_mode = REQUIRED
Failover Testing and Chaos Engineering
A high availability system that has never been tested under failure conditions is not truly high available — it is a hypothesis. Failover testing must be part of your regular operational cadence, not something you discover works (or does not work) during an actual incident.
Controlled Failover Testing
# Test 1: Graceful primary failover (InnoDB Cluster)
mysqlsh gradmin@mysql-node-1:3306
var cluster = dba.getCluster()
cluster.setPrimaryInstance('gradmin@mysql-node-2:3306')
# Verify: cluster.status() should show node-2 as PRIMARY
# Test 2: Simulate primary crash
# On the primary node:
kill -9 $(pidof mysqld)
# Monitor: watch the cluster elect a new primary
# Verify: applications reconnect via MySQL Router within seconds
# Test 3: Network partition simulation
# On the primary node, block Group Replication port:
iptables -A INPUT -p tcp --dport 33061 -j DROP
iptables -A OUTPUT -p tcp --dport 33061 -j DROP
# The isolated node should be expelled; cluster continues with remaining members
# Cleanup:
iptables -D INPUT -p tcp --dport 33061 -j DROP
iptables -D OUTPUT -p tcp --dport 33061 -j DROP
# Test 4: Kubernetes pod deletion
kubectl delete pod mysql-0 -n production --grace-period=0 --force
# The StatefulSet controller recreates the pod
# The operator rejoins it to the cluster
Chaos Engineering with Litmus or Chaos Mesh
Structured chaos engineering tools inject failures systematically and measure the blast radius. Chaos Mesh, a CNCF project, integrates natively with Kubernetes.
# Chaos Mesh: Kill a MySQL pod randomly
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
name: mysql-pod-kill
namespace: production
spec:
action: pod-kill
mode: one
selector:
namespaces:
- production
labelSelectors:
app.kubernetes.io/component: mysqld
scheduler:
cron: "0 */4 * * *"
---
# Chaos Mesh: Network delay between MySQL pods
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
name: mysql-network-delay
namespace: production
spec:
action: delay
mode: one
selector:
namespaces:
- production
labelSelectors:
app.kubernetes.io/component: mysqld
delay:
latency: "200ms"
jitter: "50ms"
duration: "5m"
scheduler:
cron: "30 */6 * * *"
---
# Chaos Mesh: Disk I/O stress
apiVersion: chaos-mesh.org/v1alpha1
kind: IOChaos
metadata:
name: mysql-io-latency
namespace: production
spec:
action: latency
mode: one
selector:
namespaces:
- production
labelSelectors:
app.kubernetes.io/component: mysqld
volumePath: /var/lib/mysql
path: '*'
delay: '100ms'
percent: 50
duration: '5m'
What to Measure During Failover Tests
- Recovery Time Objective (RTO) — How long from failure detection to service restoration? InnoDB Cluster typically achieves 5–30 seconds. Percona XtraDB Cluster can be faster because there is no election (all nodes are writable).
- Recovery Point Objective (RPO) — How much data is lost during failover? With synchronous replication (Group Replication in single-primary mode, PXC), RPO is zero for committed transactions. With asynchronous replication (ClusterSet cross-region), RPO equals the replication lag.
- Application error rate — How many application requests fail during the failover window? This tests not just the database failover but your application's connection retry logic and MySQL Router's re-routing speed.
- Connection drain time — How long do existing connections to the old primary take to drain and reconnect to the new primary?
Runbook: Primary Node Failure Checklist
- Verify the cluster has elected a new primary:
cluster.status() - Confirm MySQL Router is routing to the new primary: check router logs and connection counts
- Monitor replication lag on remaining secondaries:
SELECT * FROM performance_schema.replication_group_member_stats - If the failed node can be recovered, rejoin it:
cluster.rejoinInstance('gradmin@failed-node:3306') - If the failed node cannot be recovered, remove it and add a new one:
cluster.removeInstance('gradmin@failed-node:3306', {force: true}) - Verify cluster health: all members ONLINE, no replication errors, backup schedule intact
- Update your capacity plan: running with 2 nodes means you have zero fault tolerance until the third is restored
Security Hardening
Production MySQL deployments must address several security concerns beyond basic authentication.
Encryption in Transit and at Rest
[mysqld]
# TLS for client connections
ssl_ca = /etc/mysql/certs/ca.pem
ssl_cert = /etc/mysql/certs/server-cert.pem
ssl_key = /etc/mysql/certs/server-key.pem
require_secure_transport = ON
tls_version = TLSv1.3
# Encryption at rest (InnoDB tablespace encryption)
early-plugin-load = keyring_file.so
keyring_file_data = /var/lib/mysql-keyring/keyring
innodb_undo_log_encrypt = ON
innodb_redo_log_encrypt = ON
default_table_encryption = ON
Kubernetes Secrets and Sealed Secrets
# Never store database credentials in plain YAML
# Use Sealed Secrets or an external secret manager
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: mysql-root-credentials
namespace: production
spec:
encryptedData:
rootUser: AgBy3i4OJSWK+PiTy...
rootPassword: AgCtr7pJ2XQWK+Pi...
template:
metadata:
name: mysql-root-credentials
namespace: production
type: Opaque
Audit Logging
[mysqld]
# MySQL Enterprise Audit (or Percona Audit Log plugin)
plugin-load-add = audit_log.so
audit_log_format = JSON
audit_log_policy = LOGINS
audit_log_rotate_on_size = 100M
audit_log_rotations = 10
Decision Matrix: Choosing Your MySQL HA Architecture
The right architecture depends on your specific requirements. Here is a decision framework.
- Single cloud, managed preference, MySQL-compatible — Use Aurora (AWS), Flexible Server Business Critical (Azure), or Cloud SQL Enterprise Plus (GCP). These provide the lowest operational overhead.
- Single cloud, self-managed, need full MySQL control — Use MySQL Operator or Percona Operator on EKS/AKS/GKE with cloud-native storage classes.
- Multi-cloud or hybrid cloud — Use Percona Operator with PXC or MySQL Operator with InnoDB Cluster. The Kubernetes abstraction layer makes your MySQL deployment portable across clouds.
- Bare metal or edge — Use k3s with Rancher, Longhorn storage, MetalLB, and either MySQL Operator or Percona Operator.
- Maximum write throughput, multi-primary required — Use Percona XtraDB Cluster (Galera) with the Percona Operator. All nodes accept writes with synchronous certification.
- Global distribution with disaster recovery — Use InnoDB ClusterSet with asynchronous replication between regions. Accept the RPO trade-off of asynchronous replication for the latency benefits of local writes.
Operational Checklist for Production MySQL HA
Before declaring your MySQL HA deployment production-ready, validate every item on this checklist.
- Cluster health — All members report ONLINE status. Group Replication shows no errors.
- Automated backups — CronJob or operator-managed backups running on schedule. Backups verified by periodic restore tests.
- Monitoring — PMM or Prometheus/Grafana collecting MySQL metrics. Alerts configured for replication lag, connection saturation, buffer pool hit ratio below 99%, and disk space.
- Failover tested — Primary failover tested within the last 30 days. RTO and RPO measured and within SLA targets.
- Connection routing — MySQL Router or ProxySQL health-checked and load-balanced. Application connection strings point to the router, not individual MySQL instances.
- Security — TLS required for all connections. Encryption at rest enabled. Credentials stored in a secret manager. Audit logging active. Least-privilege database users for each application.
- Resource limits — Kubernetes resource requests and limits set appropriately. PodDisruptionBudgets in place. Pod anti-affinity spreading MySQL pods across zones.
- Capacity plan — Storage utilisation monitored with alerts at 70% and 85%. Volume expansion tested. Vertical and horizontal scaling procedures documented.
- Runbooks — Documented procedures for primary failover, node replacement, backup restore, version upgrade, and emergency read-only mode.
- Chaos testing — Regular chaos experiments scheduled to validate resilience assumptions.
Conclusion
MySQL high availability in production is not a single technology choice — it is a system of interlocking decisions that span the replication layer, the orchestration platform, the storage subsystem, the monitoring stack, and the operational processes around them. InnoDB Cluster with Group Replication provides the foundational HA primitive. Kubernetes operators from Oracle and Percona automate the lifecycle management that would otherwise consume significant engineering time. Cloud managed services trade control for convenience. Bare metal deployments with k3s, Rancher, and Longhorn prove that you don't need a cloud provider to run production-grade MySQL HA.
The most critical insight is that high availability is a property of the entire system, not just the database. It includes how your application handles connection failures and retries, how your proxy layer detects and routes around failed nodes, how your monitoring alerts before users notice, how your backup strategy enables recovery from scenarios that HA alone cannot handle, and how your team practices failover procedures so they execute cleanly under the stress of a real incident.
Build it deliberately, test it regularly, and treat your runbooks as living documents that evolve with every incident and every chaos experiment. That is how MySQL earns its place as a reliable, highly available data platform in production.