PostgreSQL HA with Zalando Operator | K8s Multi-Cloud Guide

Introduction: Why PostgreSQL High Availability on Kubernetes Matters

Running PostgreSQL in production demands high availability (HA). Downtime measured in minutes can cost enterprises millions of dollars in lost revenue, eroded customer trust, and violated service-level agreements. Kubernetes has become the de facto platform for orchestrating containerized workloads, but running stateful services like PostgreSQL on Kubernetes introduces unique challenges: persistent storage management, leader election, automatic failover, backup orchestration, and connection pooling.

The Zalando Postgres Operator is an open-source, battle-tested solution that Zalando—Europe's largest online fashion retailer—built to manage hundreds of PostgreSQL clusters in production. It leverages Patroni for consensus-based leader election, Spilo as the PostgreSQL container image, WAL-G for continuous archiving and point-in-time recovery, and PgBouncer for connection pooling. Together, these components deliver a fully automated, self-healing PostgreSQL deployment that works across any Kubernetes distribution—from managed cloud services like AWS EKS, Azure AKS, and Google GKE to bare-metal clusters running k3s with Rancher.

In this comprehensive guide, we will explore the architecture of the Zalando Postgres Operator, walk through installation and configuration on multiple Kubernetes platforms, deep-dive into replication, backups, disaster recovery, monitoring, and production tuning. By the end, you will have the knowledge to deploy and operate production-grade PostgreSQL HA clusters on any Kubernetes infrastructure.

Zalando Postgres Operator Architecture

Understanding the architecture is essential before deploying. The Zalando Postgres Operator follows the Kubernetes operator pattern: it watches for Custom Resource Definitions (CRDs) of type postgresql and reconciles the desired state into actual Kubernetes resources. Here is how the components fit together:

Spilo: The PostgreSQL Container Image

Spilo is Zalando's Docker image that bundles PostgreSQL with Patroni, WAL-G, and essential extensions. Each pod in the StatefulSet runs a Spilo container. Spilo handles:

PostgreSQL server — the database engine itself, supporting versions 13 through 16
Patroni — the HA agent that manages leader election, replication, and failover
WAL-G — continuous WAL archiving and base backup tool
pg_cron, pg_stat_statements, PostGIS — commonly needed extensions pre-installed

Patroni: Leader Election and Automatic Failover

Patroni is the heart of the HA mechanism. It uses a Distributed Configuration Store (DCS) to maintain cluster state and perform leader election. In the Zalando operator context, Patroni uses the Kubernetes API itself as the DCS (via Endpoints or ConfigMaps), eliminating the need for an external etcd or ZooKeeper cluster.

Here is how Patroni's failover process works:

Health checks — Each Patroni agent continuously monitors its local PostgreSQL instance and reports health to the DCS.
Leader lock — The primary holds a leader lock in the DCS (a Kubernetes Endpoint object). The lock has a TTL (default 30 seconds).
Failure detection — If the primary fails to renew its lock within the TTL, replicas detect the absence.
Election — Eligible replicas compete for the leader lock. The replica with the least replication lag wins.
Promotion — The winning replica promotes itself to primary, updates the DCS, and the Kubernetes master Service endpoint updates automatically.
Fencing — The old primary is fenced (stopped or demoted to replica) to prevent split-brain.

This entire failover process typically completes in 15-30 seconds, ensuring minimal downtime for your applications.

WAL-G: Continuous Archiving and Backup

WAL-G is a next-generation archival tool for PostgreSQL that supports backup to S3, Google Cloud Storage (GCS), and Azure Blob Storage. It provides:

Base backups — Full physical backups using pg_basebackup
WAL archiving — Continuous Write-Ahead Log shipping for point-in-time recovery
Delta backups — Incremental backups that only store changed pages
Encryption — AES-256 encryption of backups at rest
Compression — LZ4 or ZSTD compression for reduced storage costs

Kubernetes Resource Hierarchy

When you create a postgresql Custom Resource, the operator creates a comprehensive set of Kubernetes resources to manage the cluster. Understanding this hierarchy is important for troubleshooting and monitoring:

Resources Created by the Operator

StatefulSet — Manages the PostgreSQL pods with stable network identities and ordered deployment
Services — Two ClusterIP services: <cluster-name> for the primary and <cluster-name>-repl for read replicas
Endpoints — Patroni updates endpoints to point to the current leader for seamless failover
PodDisruptionBudgets (PDB) — Ensures at least one instance remains available during voluntary disruptions
Secrets — PostgreSQL superuser, replication, and application credentials stored as Kubernetes Secrets
PersistentVolumeClaims (PVCs) — One PVC per pod for PostgreSQL data storage
PgBouncer Deployment — Optional connection pooler deployed as a separate Deployment with its own Service

Installation on Multiple Kubernetes Platforms

Prerequisites

Before installing the Zalando Postgres Operator, ensure you have:

A running Kubernetes cluster (v1.25+)
kubectl configured with cluster admin access
helm v3 installed
A default StorageClass configured

Operator Installation via Helm

The recommended installation method uses Helm. This works consistently across all Kubernetes platforms:

# Add the Zalando Postgres Operator Helm repository
helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm repo add postgres-operator-ui-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator-ui
helm repo update

# Create a dedicated namespace
kubectl create namespace postgres-operator

# Install the operator with custom values
helm install postgres-operator postgres-operator-charts/postgres-operator \
  --namespace postgres-operator \
  --set configKubernetes.enable_pod_antiaffinity=true \
  --set configKubernetes.pod_environment_configmap=postgres-pod-config \
  --set configAwsOrGcp.aws_region=us-east-1 \
  --set configLoadBalancer.db_hosted_zone=db.example.com \
  --set configConnectionPooler.connection_pooler_default_cpu_request=500m \
  --set configConnectionPooler.connection_pooler_default_memory_request=100Mi

# Optionally install the operator UI for visual management
helm install postgres-operator-ui postgres-operator-ui-charts/postgres-operator-ui \
  --namespace postgres-operator

Verify the operator is running:

kubectl get pods -n postgres-operator
# Expected output:
# NAME                                 READY   STATUS    RESTARTS   AGE
# postgres-operator-7f8b9c6d4-x2k9j   1/1     Running   0          2m

AWS EKS Deployment Specifics

Amazon EKS requires specific configuration for optimal PostgreSQL performance:

# EKS-specific Helm values (eks-values.yaml)
configAwsOrGcp:
  aws_region: us-east-1
  enable_ebs_gp3_migration: true
  additional_secret_mount: "aws-iam-token"

configKubernetes:
  enable_pod_antiaffinity: true
  pod_environment_configmap: "postgres-pod-config"
  spilo_privileged: false
  storage_resize_mode: pvc

# Use EBS gp3 StorageClass for better performance
# Create the StorageClass first:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3-postgres
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "6000"
  throughput: "400"
  encrypted: "true"
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

For WAL-G backups on EKS, configure IAM roles for Service Accounts (IRSA):

# Create IAM policy for WAL-G S3 access
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::my-pg-backups",
        "arn:aws:s3:::my-pg-backups/*"
      ]
    }
  ]
}

Azure AKS Deployment

Azure AKS uses Azure Disk for persistent storage and Managed Identity for backup authentication:

# AKS-specific StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: managed-premium-postgres
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS
  cachingMode: ReadOnly
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# WAL-G backup to Azure Blob Storage environment variables
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: default
data:
  AZURE_STORAGE_ACCOUNT: "pgbackupsstorage"
  AZURE_STORAGE_ACCESS_KEY: "" # Use Managed Identity instead
  WALG_AZ_PREFIX: "azure://pg-wal-backups/$(SCOPE)"
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  BACKUP_SCHEDULE: "0 2 * * *"
  BACKUP_NUM_TO_RETAIN: "7"

Google GKE Deployment

Google Kubernetes Engine uses Persistent Disk and Workload Identity for GCS backup access:

# GKE StorageClass for SSD Persistent Disks
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd-postgres
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# GCS backup configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: default
data:
  WALG_GS_PREFIX: "gs://my-pg-backups/$(SCOPE)"
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  GOOGLE_APPLICATION_CREDENTIALS: "/var/secrets/google/key.json"
  BACKUP_SCHEDULE: "0 2 * * *"
  BACKUP_NUM_TO_RETAIN: "7"

Bare Metal k3s/Rancher with Longhorn Storage

For on-premises deployments, k3s provides a lightweight Kubernetes distribution and Longhorn offers distributed block storage. This combination is ideal when you need full control over your infrastructure without cloud vendor lock-in.

# Install k3s on all nodes
# Master node:
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --disable traefik \
  --write-kubeconfig-mode 644

# Worker nodes:
curl -sfL https://get.k3s.io | sh -s - agent \
  --server https://master-ip:6443 \
  --token $(cat /var/lib/rancher/k3s/server/node-token)

# Install Longhorn for persistent storage
helm repo add longhorn https://charts.longhorn.io
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultReplicaCount=3 \
  --set defaultSettings.defaultDataPath=/mnt/longhorn

# Install MetalLB for LoadBalancer services
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.5/config/manifests/metallb-native.yaml

# Configure MetalLB IP pool
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: postgres-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.1.200-192.168.1.210

PostgreSQL Cluster CRD Specification

The core of deploying a PostgreSQL cluster with the Zalando operator is the postgresql Custom Resource. This YAML manifest declares the desired state of your cluster, and the operator reconciles it into reality. Below is a comprehensive, production-ready CRD specification:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-production-cluster
  namespace: databases
  labels:
    team: platform
    environment: production
spec:
  teamId: "platform"
  volume:
    size: 100Gi
    storageClass: ebs-gp3-postgres
  numberOfInstances: 3
  enableConnectionPooler: true
  enableReplicaConnectionPooler: true
  connectionPooler:
    numberOfInstances: 2
    mode: transaction
    schema: pooler
    user: pooler
    resources:
      requests:
        cpu: 500m
        memory: 100Mi
      limits:
        cpu: "1"
        memory: 256Mi
  users:
    app_user:
    - superuser
    - createdb
    readonly_user: []
  databases:
    app_database: app_user
  postgresql:
    version: "16"
    parameters:
      shared_buffers: "2GB"
      max_connections: "200"
      work_mem: "64MB"
      maintenance_work_mem: "512MB"
      effective_cache_size: "6GB"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      wal_buffers: "64MB"
      max_wal_size: "4GB"
      min_wal_size: "1GB"
      checkpoint_completion_target: "0.9"
      default_statistics_target: "100"
      log_statement: "ddl"
      log_min_duration_statement: "1000"
      idle_in_transaction_session_timeout: "600000"
      lock_timeout: "30000"
      statement_timeout: "60000"
  patroni:
    initdb:
      encoding: "UTF8"
      locale: "en_US.UTF-8"
      data-checksums: "true"
    pg_hba:
    - hostssl all all 0.0.0.0/0 md5
    - host    all all 0.0.0.0/0 md5
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    synchronous_mode: false
    synchronous_mode_strict: false
    maximum_lag_on_failover: 33554432
  resources:
    requests:
      cpu: "2"
      memory: 8Gi
    limits:
      cpu: "4"
      memory: 16Gi
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9187"
  tolerations:
  - key: "database"
    operator: "Equal"
    value: "postgres"
    effect: "NoSchedule"
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: workload-type
          operator: In
          values:
          - database
  enableShmVolume: true
  spiloRunAsUser: 101
  spiloRunAsGroup: 103
  spiloFSGroup: 103

Key CRD Fields Explained

numberOfInstances — Total number of pods. The operator automatically designates one as primary and the rest as streaming replicas.
enableConnectionPooler — Deploys PgBouncer sidecar for the primary service, reducing connection overhead.
enableReplicaConnectionPooler — Deploys a separate PgBouncer for the replica service, essential for read-heavy workloads.
postgresql.parameters — Direct PostgreSQL configuration parameters passed to postgresql.conf.
patroni — Configures Patroni behavior including TTL, loop wait, retry timeout, and synchronous replication mode.
volume.storageClass — Maps to the platform-specific StorageClass (EBS gp3 on AWS, Premium SSD on Azure, SSD PD on GCP, Longhorn on k3s).
enableShmVolume — Mounts a tmpfs at /dev/shm for PostgreSQL shared memory, critical for performance.

Connection Pooling with PgBouncer

PostgreSQL's process-per-connection model makes it expensive to handle large numbers of client connections. Each connection consumes approximately 10MB of RAM. PgBouncer solves this by multiplexing thousands of client connections over a small pool of actual PostgreSQL connections.

The Zalando operator natively supports PgBouncer deployment. When you set enableConnectionPooler: true in the CRD, the operator creates:

A PgBouncer Deployment with configurable replica count
A dedicated Service (<cluster-name>-pooler) for pooled connections
Automatic credential synchronization with PostgreSQL

PgBouncer Configuration Modes

# PgBouncer connection pooler modes:
#
# transaction (recommended for most workloads)
#   - Connection returned to pool after each transaction
#   - Best balance of efficiency and compatibility
#   - Cannot use session-level features (prepared statements, temp tables)
#
# session
#   - Connection held for entire client session
#   - Full PostgreSQL compatibility
#   - Lower pooling efficiency
#
# statement
#   - Connection returned after each statement
#   - Most efficient but most restrictive
#   - Only works with autocommit queries

# Custom PgBouncer configuration via operator
spec:
  connectionPooler:
    numberOfInstances: 3
    mode: transaction
    schema: pooler
    user: pooler
    defaultPoolSize: 25
    maxDBConnections: 100
    resources:
      requests:
        cpu: 250m
        memory: 128Mi
      limits:
        cpu: "1"
        memory: 256Mi

For applications that need prepared statements or session-level features, connect directly to the PostgreSQL services bypassing PgBouncer, or use session pooling mode with the tradeoff of lower connection efficiency.

Multi-Region PostgreSQL Replication

For global applications that require low-latency reads from multiple geographic locations or disaster recovery across regions, multi-region replication is essential. The Zalando operator supports this through standby clusters that replicate from a primary cluster via streaming replication or WAL-G archives.

Streaming Replication Configuration

PostgreSQL streaming replication is the foundation of HA in the Zalando operator. It works by shipping Write-Ahead Log (WAL) records from the primary to replicas in near-real-time. The operator configures this automatically, but understanding the details helps with tuning and troubleshooting.

Synchronous replication — The primary waits for at least one replica to confirm WAL receipt before committing a transaction. This guarantees zero data loss (RPO=0) but adds latency. Enable with patroni.synchronous_mode: true.
Asynchronous replication — The primary commits immediately and ships WAL asynchronously. Slightly lower latency but potential data loss during failover. This is the default.
Cascading replication — Replicas can replicate from other replicas instead of the primary, reducing load on the primary in large clusters.

# Enable synchronous replication for zero data loss
spec:
  patroni:
    synchronous_mode: true
    synchronous_mode_strict: false  # Allow async if no sync replica available
    synchronous_node_count: 1       # Number of sync replicas required
  postgresql:
    parameters:
      synchronous_commit: "on"      # Matches Patroni synchronous_mode
      max_wal_senders: "10"         # Maximum WAL sender processes
      wal_keep_size: "1GB"          # WAL retention for replica catch-up
      hot_standby: "on"             # Allow queries on replicas
      hot_standby_feedback: "on"    # Reduce query conflicts on replicas

Backup and Recovery with WAL-G

WAL-G Backup Configuration

Proper backup configuration is critical for disaster recovery. The Zalando operator integrates WAL-G for continuous backup to object storage. Here is a complete configuration for S3-compatible storage:

# ConfigMap for WAL-G backup configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: databases
data:
  # S3 backup configuration
  AWS_ENDPOINT: "https://s3.us-east-1.amazonaws.com"
  AWS_S3_FORCE_PATH_STYLE: "false"
  AWS_REGION: "us-east-1"
  WALG_S3_PREFIX: "s3://my-pg-backups/$(SCOPE)"
  WALG_DISABLE_S3_SSE: "false"
  WALG_S3_SSE: "aws:kms"
  WALG_S3_SSE_KMS_ID: "arn:aws:kms:us-east-1:123456789:key/mrk-abcdef"

  # Backup scheduling and retention
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  BACKUP_SCHEDULE: "0 1 * * *"       # Daily at 1 AM UTC
  BACKUP_NUM_TO_RETAIN: "14"          # Keep 14 daily backups

  # WAL archiving
  WALG_COMPRESSION_METHOD: "zstd"     # Better compression than lz4
  WALG_DELTA_MAX_STEPS: "6"           # Delta backups between full backups
  WALG_UPLOAD_CONCURRENCY: "4"        # Parallel upload streams
  WALG_DOWNLOAD_CONCURRENCY: "4"      # Parallel download for restore
  WALG_UPLOAD_DISK_CONCURRENCY: "4"   # Disk read concurrency

  # Clone configuration
  CLONE_AWS_ENDPOINT: "https://s3.us-east-1.amazonaws.com"
  CLONE_AWS_REGION: "us-east-1"
  CLONE_WALG_S3_PREFIX: "s3://my-pg-backups/$(CLONE_SCOPE)"
  CLONE_METHOD: "CLONE_WITH_WALG"
  CLONE_USE_WALG_RESTORE: "true"

Point-in-Time Recovery (PITR)

PITR allows you to restore your database to any specific moment in time—critical for recovering from accidental data deletion or corruption. The Zalando operator supports PITR through the clone mechanism:

# Clone a cluster with PITR
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-restored-cluster
  namespace: databases
spec:
  teamId: "platform"
  volume:
    size: 100Gi
    storageClass: ebs-gp3-postgres
  numberOfInstances: 3
  postgresql:
    version: "16"
  clone:
    cluster: "pg-production-cluster"
    timestamp: "2026-04-11T14:30:00+00:00"  # Restore to this exact moment
    s3_wal_path: "s3://my-pg-backups/wal/16/pg-production-cluster"
    s3_endpoint: "https://s3.us-east-1.amazonaws.com"
    s3_access_key_id: ""      # Use IAM role instead
    s3_secret_access_key: ""  # Use IAM role instead

When this CRD is applied, the operator performs the following steps:

Finds the latest base backup before the target timestamp
Restores the base backup to the new StatefulSet's primary pod
Replays WAL segments up to the specified timestamp
Opens the database for read-write operations
Sets up streaming replication to the replica pods

Standby Cluster for Disaster Recovery

A standby cluster continuously replicates from a primary cluster, providing a warm standby that can be promoted during a disaster. This is different from replicas within a cluster—a standby cluster is a completely independent Kubernetes resource that can run in a different namespace, cluster, or even region.

# Standby cluster in a different region
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-standby-eu
  namespace: databases
spec:
  teamId: "platform"
  volume:
    size: 100Gi
    storageClass: managed-premium-postgres
  numberOfInstances: 2
  postgresql:
    version: "16"
  standby:
    standby_host: "pg-production-cluster.databases.svc.cluster.local"
    standby_port: "5432"
    # Alternative: replicate from S3 WAL archive
    # s3_wal_path: "s3://my-pg-backups/wal/16/pg-production-cluster"
  enableConnectionPooler: true
  enableReplicaConnectionPooler: true

To promote a standby cluster to an independent primary (during disaster recovery), simply remove the standby section from the CRD and apply:

# Edit the standby cluster CRD to remove standby section
kubectl patch postgresql pg-standby-eu -n databases --type json \
  -p '[{"op": "remove", "path": "/spec/standby"}]'

# The operator will promote the standby to primary
# Update your application DNS/service mesh to point to the new primary

Monitoring with Prometheus and Grafana

Comprehensive monitoring is non-negotiable for production PostgreSQL deployments. The Zalando operator supports Prometheus metrics export via the postgres_exporter sidecar. Here is how to set up a complete monitoring stack:

ServiceMonitor for Prometheus

# ServiceMonitor for PostgreSQL metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: postgres-monitor
  namespace: databases
  labels:
    team: platform
    release: prometheus
spec:
  selector:
    matchLabels:
      team: platform
  namespaceSelector:
    matchNames:
    - databases
  endpoints:
  - port: exporter
    interval: 15s
    scrapeTimeout: 10s
    path: /metrics
    relabelings:
    - sourceLabels: [__meta_kubernetes_pod_label_spilo_role]
      targetLabel: role
    - sourceLabels: [__meta_kubernetes_pod_label_cluster_name]
      targetLabel: cluster
---
# PodMonitor alternative (scrapes pods directly)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: postgres-pod-monitor
  namespace: databases
spec:
  selector:
    matchLabels:
      application: spilo
  podMetricsEndpoints:
  - port: exporter
    interval: 15s
    path: /metrics

Key Metrics to Monitor

These are the most critical PostgreSQL metrics to track in your Grafana dashboards:

pg_stat_replication_lag — Replication lag in bytes and seconds. Alert if lag exceeds your RPO threshold.
pg_stat_activity_count — Active connections by state. Alert on connection pool exhaustion.
pg_stat_database_tup_fetched/returned/inserted/updated/deleted — Query throughput metrics.
pg_stat_bgwriter_buffers_checkpoint/clean/backend — Buffer management efficiency.
pg_database_size_bytes — Database size growth over time for capacity planning.
pg_locks_count — Lock contention. Alert on excessive waiting locks.
pg_stat_statements_calls/mean_time — Query performance statistics for optimization.
patroni_postgres_running — Patroni health status (1 = running, 0 = down).
patroni_master — Which pod is the current primary (1 = master, 0 = replica).
pg_up — Basic PostgreSQL availability probe.

Alerting Rules

# PrometheusRule for PostgreSQL alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: postgres-alerts
  namespace: databases
spec:
  groups:
  - name: postgresql.rules
    rules:
    - alert: PostgreSQLDown
      expr: pg_up == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "PostgreSQL instance {{ $labels.instance }} is down"
    - alert: PostgreSQLReplicationLag
      expr: pg_stat_replication_pg_wal_lsn_diff > 100000000
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Replication lag is {{ $value }} bytes on {{ $labels.instance }}"
    - alert: PostgreSQLHighConnections
      expr: sum(pg_stat_activity_count) by (instance) > 180
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "{{ $value }} active connections on {{ $labels.instance }}"
    - alert: PostgreSQLDeadlocks
      expr: rate(pg_stat_database_deadlocks[5m]) > 0
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "Deadlocks detected on {{ $labels.datname }}"
    - alert: PatroniFailover
      expr: changes(patroni_master[5m]) > 0
      labels:
        severity: critical
      annotations:
        summary: "Patroni failover occurred in cluster {{ $labels.cluster }}"

Production Tuning Guide

Proper tuning is essential to extract maximum performance from PostgreSQL on Kubernetes. The following parameters should be adjusted based on your pod resource limits and workload characteristics.

Memory Configuration

# For a pod with 16GB memory limit:
postgresql:
  parameters:
    # shared_buffers: 25% of total memory
    shared_buffers: "4GB"

    # effective_cache_size: 75% of total memory
    # (tells planner how much OS cache to expect)
    effective_cache_size: "12GB"

    # work_mem: shared_buffers / (max_connections * 2)
    # Conservative to prevent OOM
    work_mem: "10MB"

    # maintenance_work_mem: 5-10% of total memory
    # Used for VACUUM, CREATE INDEX, ALTER TABLE
    maintenance_work_mem: "1GB"

    # temp_buffers: memory for temp tables per session
    temp_buffers: "32MB"

    # huge_pages: try to use huge pages (requires OS config)
    huge_pages: "try"

WAL and Checkpoint Configuration

postgresql:
  parameters:
    # WAL settings
    wal_buffers: "64MB"           # 1/32 of shared_buffers, max 64MB
    wal_compression: "zstd"        # Compress WAL (PG 15+)
    max_wal_size: "8GB"            # Before forced checkpoint
    min_wal_size: "2GB"            # WAL disk reservation
    wal_level: "replica"           # Required for replication

    # Checkpoint settings
    checkpoint_completion_target: "0.9"  # Spread I/O over 90% of interval
    checkpoint_timeout: "15min"          # Max time between checkpoints

Query Planner and I/O

postgresql:
  parameters:
    # Cost parameters for SSD storage
    random_page_cost: "1.1"          # SSD: close to seq_page_cost
    seq_page_cost: "1.0"             # Sequential I/O baseline
    effective_io_concurrency: "200"   # Concurrent I/O for SSD

    # Planner behavior
    default_statistics_target: "200"  # More accurate statistics
    from_collapse_limit: 12           # JOIN planning threshold
    join_collapse_limit: 12           # JOIN planning threshold

    # Parallel queries
    max_parallel_workers_per_gather: "4"
    max_parallel_workers: "8"
    max_parallel_maintenance_workers: "4"
    parallel_tuple_cost: "0.01"
    parallel_setup_cost: "1000"

Connection and Logging

postgresql:
  parameters:
    # Connection limits
    max_connections: "200"                   # Keep low, use PgBouncer
    superuser_reserved_connections: "5"       # Reserve for admin access

    # Logging for troubleshooting
    log_statement: "ddl"                     # Log DDL statements
    log_min_duration_statement: "500"         # Log queries > 500ms
    log_checkpoints: "on"                    # Log checkpoint activity
    log_connections: "off"                   # Too noisy in production
    log_disconnections: "off"                # Too noisy in production
    log_lock_waits: "on"                     # Log lock waits
    log_temp_files: "0"                      # Log all temp file usage
    log_autovacuum_min_duration: "1000"       # Log slow autovacuum

    # Statement timeouts
    statement_timeout: "60000"               # 60 second query timeout
    lock_timeout: "30000"                    # 30 second lock timeout
    idle_in_transaction_session_timeout: "600000"  # 10 min idle txn timeout

Rolling Updates and Version Upgrades

Minor Version Upgrades

Minor version upgrades (e.g., 16.2 to 16.3) are handled automatically by the operator when you update the Spilo image tag. The operator performs a rolling restart:

# Update the operator configuration to use a new Spilo image
helm upgrade postgres-operator postgres-operator-charts/postgres-operator \
  --namespace postgres-operator \
  --set configGeneral.docker_image=ghcr.io/zalando/spilo-16:3.1-p1 \
  --reuse-values

# The operator will perform rolling updates:
# 1. Restart replicas one at a time
# 2. Failover the primary to a freshly updated replica
# 3. Restart the old primary (now a replica)

Major Version Upgrades

Major version upgrades (e.g., PostgreSQL 15 to 16) require more careful planning. The Zalando operator supports in-place major upgrades using pg_upgrade:

# Step 1: Update the CRD to the new major version
spec:
  postgresql:
    version: "16"  # Changed from "15"

# Step 2: The operator detects the version change and:
# a) Scales down the StatefulSet to 1 replica
# b) Runs pg_upgrade on the primary pod
# c) Scales back up to the desired numberOfInstances
# d) Replicas are rebuilt from the upgraded primary

# Step 3: Verify the upgrade
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "psql -c 'SELECT version()'"

# Step 4: Run ANALYZE to update statistics after upgrade
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "psql -c 'ANALYZE VERBOSE'"

Important considerations for major upgrades:

Always take a fresh backup before upgrading
Test the upgrade on a cloned cluster first
Major upgrades require downtime (typically 5-30 minutes depending on database size)
Review the PostgreSQL release notes for breaking changes
Monitor replication lag closely after replica rebuild
Run ANALYZE on all databases to regenerate query planner statistics

Advanced Operational Patterns

Logical Backups

In addition to physical WAL-G backups, the operator supports logical backups using pg_dump. Logical backups are useful for cross-version migrations and selective table restores:

# Enable logical backups in the operator configuration
helm upgrade postgres-operator postgres-operator-charts/postgres-operator \
  --namespace postgres-operator \
  --set configLogicalBackup.logical_backup_schedule="0 3 * * *" \
  --set configLogicalBackup.logical_backup_s3_bucket="my-pg-logical-backups" \
  --set configLogicalBackup.logical_backup_s3_region="us-east-1" \
  --set configLogicalBackup.logical_backup_s3_sse="AES256" \
  --reuse-values

# The operator creates a CronJob for each cluster that:
# 1. Connects to the primary PostgreSQL instance
# 2. Runs pg_dumpall (or pg_dump per database)
# 3. Compresses and uploads to S3

Custom Pod Environment Variables

You can inject environment variables into Spilo pods using a ConfigMap. This is useful for configuring WAL-G, custom scripts, or tuning OS-level parameters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: databases
data:
  # Custom Spilo configurations
  SPILO_CONFIGURATION: |
    bootstrap:
      dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 33554432
        postgresql:
          use_pg_rewind: true
          use_slots: true
          parameters:
            archive_mode: "on"
            archive_timeout: 1800s
  # Enable pg_stat_statements
  POSTGRESQL_SHARED_PRELOAD_LIBRARIES: "bg_mon,pg_stat_statements,pgextwlist,pg_auth_mon,set_user,timescaledb,pg_cron,pg_stat_kcache"
  # Cron jobs inside PostgreSQL
  ENABLE_PG_CRON: "true"

Network Policies for Security

In production, restrict network access to PostgreSQL pods using Kubernetes NetworkPolicies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: postgres-network-policy
  namespace: databases
spec:
  podSelector:
    matchLabels:
      application: spilo
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: app-namespace
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 5432
    - protocol: TCP
      port: 8008   # Patroni REST API
  - from:
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 9187   # Prometheus exporter
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
    ports:
    - protocol: TCP
      port: 443    # S3/GCS/Azure for backups

Troubleshooting Common Issues

Even with automation, issues arise. Here are the most common problems and their solutions when running PostgreSQL with the Zalando operator:

1. Pod Stuck in Pending State

# Check events for the pod
kubectl describe pod pg-production-cluster-0 -n databases

# Common causes:
# - No nodes with matching tolerations/affinity
# - Insufficient CPU or memory on nodes
# - PVC cannot be provisioned (check StorageClass)

# Fix: Check node resources and storage availability
kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU:.status.allocatable.cpu,MEM:.status.allocatable.memory
kubectl get pvc -n databases

2. Replication Lag Growing

# Check replication status on the primary
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "psql -c 'SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag FROM pg_stat_replication'"

# Common causes:
# - Replica CPU/IO saturation
# - Network bandwidth limits
# - Long-running queries on replica blocking WAL replay
# - Insufficient wal_keep_size

# Fix: Check replica resources and cancel blocking queries
kubectl exec -it pg-production-cluster-1 -n databases -- \
  su postgres -c "psql -c 'SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE state = \"active\" AND query_start < now() - interval \"5 minutes\"'"

3. Failover Not Triggering

# Check Patroni cluster status
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "patronictl list"

# Check Patroni logs
kubectl logs pg-production-cluster-0 -n databases -c postgres | grep -i patroni

# Manual failover (if auto-failover is stuck)
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "patronictl failover --candidate pg-production-cluster-1 --force"

4. Backup Failures

# Check WAL-G backup status
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "envdir /run/etc/wal-e.d/env wal-g backup-list"

# Check backup CronJob logs
kubectl logs -l application=spilo,cluster-name=pg-production-cluster -n databases | grep -i wal-g

# Verify S3/GCS credentials
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "envdir /run/etc/wal-e.d/env aws s3 ls s3://my-pg-backups/"

Security Best Practices

Securing PostgreSQL on Kubernetes requires a defense-in-depth approach:

TLS encryption — Enable SSL for all client connections. The operator can automatically provision certificates using cert-manager.
Secret management — Use Kubernetes Secrets (or external secret managers like Vault) for database credentials. Never store passwords in ConfigMaps.
RBAC — Limit operator ServiceAccount permissions. Use least-privilege access for application database users.
NetworkPolicies — Restrict pod-to-pod communication as shown in the previous section.
pg_hba.conf — Configure host-based authentication to restrict which IPs and users can connect.
Audit logging — Enable pgaudit extension for SQL audit logging in regulated environments.
Encrypted storage — Use encrypted StorageClasses (EBS encryption, Azure Disk encryption, etc.).
Pod Security Standards — Run Spilo pods as non-root with restricted security contexts.

# Security-hardened CRD settings
spec:
  spiloRunAsUser: 101
  spiloRunAsGroup: 103
  spiloFSGroup: 103
  enableShmVolume: true
  patroni:
    pg_hba:
    - hostssl all all 0.0.0.0/0 md5
    - host    replication standby all md5
    - hostssl replication standby all md5
  postgresql:
    parameters:
      ssl: "on"
      ssl_min_protocol_version: "TLSv1.3"
      password_encryption: "scram-sha-256"
  additionalVolumes:
  - name: postgres-tls
    mountPath: /tls
    secret:
      secretName: pg-tls-cert
      defaultMode: 0640

Capacity Planning and Sizing

Proper sizing ensures stable performance and cost efficiency. Use these guidelines as a starting point and adjust based on your workload monitoring data:

Workload	CPU	Memory	Storage	Instances
Development	500m	1Gi	10Gi	1
Small Production	2 cores	8Gi	50Gi SSD	3
Medium Production	4 cores	16Gi	200Gi SSD	3
Large Production	8 cores	32Gi	500Gi SSD	5
Enterprise / Analytics	16+ cores	64Gi+	1Ti+ SSD	5+

Complete End-to-End Deployment Example

Let us put everything together with a complete deployment from scratch on a fresh Kubernetes cluster:

# Step 1: Create namespace and configure storage
kubectl create namespace databases
kubectl create namespace postgres-operator

# Step 2: Install the Zalando Postgres Operator
helm repo add postgres-operator-charts \
  https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm repo update

helm install postgres-operator postgres-operator-charts/postgres-operator \
  --namespace postgres-operator \
  --set configKubernetes.enable_pod_antiaffinity=true \
  --set configKubernetes.pod_environment_configmap=databases/postgres-pod-config

# Step 3: Create backup configuration
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: databases
data:
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  WALG_S3_PREFIX: "s3://my-pg-backups/\$(SCOPE)"
  BACKUP_SCHEDULE: "0 1 * * *"
  BACKUP_NUM_TO_RETAIN: "14"
EOF

# Step 4: Deploy the PostgreSQL cluster
kubectl apply -f - <<EOF
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-app-cluster
  namespace: databases
  labels:
    team: platform
spec:
  teamId: "platform"
  volume:
    size: 50Gi
  numberOfInstances: 3
  enableConnectionPooler: true
  enableReplicaConnectionPooler: true
  users:
    app_user:
    - superuser
    - createdb
  databases:
    app_db: app_user
  postgresql:
    version: "16"
    parameters:
      shared_buffers: "2GB"
      work_mem: "64MB"
      effective_cache_size: "6GB"
  resources:
    requests:
      cpu: "2"
      memory: 8Gi
    limits:
      cpu: "4"
      memory: 16Gi
EOF

# Step 5: Wait for cluster readiness
kubectl wait --for=condition=Running postgresql/pg-app-cluster \
  -n databases --timeout=300s

# Step 6: Verify cluster status
kubectl get postgresql -n databases
kubectl get pods -n databases -l cluster-name=pg-app-cluster
kubectl get svc -n databases -l cluster-name=pg-app-cluster

# Step 7: Get connection credentials
export PGPASSWORD=$(kubectl get secret app-user.pg-app-cluster.credentials.postgresql.acid.zalan.do \
  -n databases -o jsonpath='{.data.password}' | base64 -d)

# Step 8: Connect and verify
kubectl run pg-client --rm -it --image=postgres:16 -n databases -- \
  psql -h pg-app-cluster-pooler -U app_user -d app_db -c "SELECT version();"

Conclusion

The Zalando Postgres Operator transforms PostgreSQL on Kubernetes from a complex operational challenge into a manageable, automated deployment. By leveraging Patroni for consensus-based failover, Spilo for a batteries-included container image, WAL-G for continuous backup and recovery, and PgBouncer for efficient connection pooling, you get a production-grade database platform that works consistently across AWS EKS, Azure AKS, Google GKE, and bare-metal k3s clusters.

The key takeaways from this guide are:

Automate everything — Let the operator handle StatefulSet management, failover, and backup scheduling. Manual intervention should be the exception.
Monitor aggressively — Deploy Prometheus and Grafana from day one. Replication lag, connection counts, and lock contention are your early warning signals.
Plan for disaster — Configure WAL-G backups, test PITR regularly, and maintain a standby cluster for critical workloads.
Tune for your workload — Default PostgreSQL parameters are conservative. Adjust shared_buffers, work_mem, and checkpoint settings based on your resource allocation and query patterns.
Secure by default — Enable TLS, use SCRAM-SHA-256 authentication, restrict network access, and encrypt storage at rest.
Test upgrades — Always clone your cluster and test major version upgrades before applying them to production.

With this comprehensive foundation, you are well-equipped to deploy and operate highly available PostgreSQL clusters on any Kubernetes platform, serving applications that demand reliability, performance, and data integrity at scale.

Introduction: Why PostgreSQL High Availability on Kubernetes Matters

Zalando Postgres Operator Architecture

Spilo: The PostgreSQL Container Image

Spilo is Zalando's Docker image that bundles PostgreSQL with Patroni, WAL-G, and essential extensions. Each pod in the StatefulSet runs a Spilo container. Spilo handles:

PostgreSQL server — the database engine itself, supporting versions 13 through 16
Patroni — the HA agent that manages leader election, replication, and failover
WAL-G — continuous WAL archiving and base backup tool
pg_cron, pg_stat_statements, PostGIS — commonly needed extensions pre-installed

Patroni: Leader Election and Automatic Failover

Here is how Patroni's failover process works:

Health checks — Each Patroni agent continuously monitors its local PostgreSQL instance and reports health to the DCS.
Leader lock — The primary holds a leader lock in the DCS (a Kubernetes Endpoint object). The lock has a TTL (default 30 seconds).
Failure detection — If the primary fails to renew its lock within the TTL, replicas detect the absence.
Election — Eligible replicas compete for the leader lock. The replica with the least replication lag wins.
Promotion — The winning replica promotes itself to primary, updates the DCS, and the Kubernetes master Service endpoint updates automatically.
Fencing — The old primary is fenced (stopped or demoted to replica) to prevent split-brain.

This entire failover process typically completes in 15-30 seconds, ensuring minimal downtime for your applications.

WAL-G: Continuous Archiving and Backup

WAL-G is a next-generation archival tool for PostgreSQL that supports backup to S3, Google Cloud Storage (GCS), and Azure Blob Storage. It provides:

Base backups — Full physical backups using pg_basebackup
WAL archiving — Continuous Write-Ahead Log shipping for point-in-time recovery
Delta backups — Incremental backups that only store changed pages
Encryption — AES-256 encryption of backups at rest
Compression — LZ4 or ZSTD compression for reduced storage costs

Kubernetes Resource Hierarchy

Resources Created by the Operator

StatefulSet — Manages the PostgreSQL pods with stable network identities and ordered deployment
Services — Two ClusterIP services: <cluster-name> for the primary and <cluster-name>-repl for read replicas
Endpoints — Patroni updates endpoints to point to the current leader for seamless failover
PodDisruptionBudgets (PDB) — Ensures at least one instance remains available during voluntary disruptions
Secrets — PostgreSQL superuser, replication, and application credentials stored as Kubernetes Secrets
PersistentVolumeClaims (PVCs) — One PVC per pod for PostgreSQL data storage
PgBouncer Deployment — Optional connection pooler deployed as a separate Deployment with its own Service

Installation on Multiple Kubernetes Platforms

Prerequisites

Before installing the Zalando Postgres Operator, ensure you have:

A running Kubernetes cluster (v1.25+)
kubectl configured with cluster admin access
helm v3 installed
A default StorageClass configured

Operator Installation via Helm

The recommended installation method uses Helm. This works consistently across all Kubernetes platforms:

# Add the Zalando Postgres Operator Helm repository
helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm repo add postgres-operator-ui-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator-ui
helm repo update

# Create a dedicated namespace
kubectl create namespace postgres-operator

# Install the operator with custom values
helm install postgres-operator postgres-operator-charts/postgres-operator \
  --namespace postgres-operator \
  --set configKubernetes.enable_pod_antiaffinity=true \
  --set configKubernetes.pod_environment_configmap=postgres-pod-config \
  --set configAwsOrGcp.aws_region=us-east-1 \
  --set configLoadBalancer.db_hosted_zone=db.example.com \
  --set configConnectionPooler.connection_pooler_default_cpu_request=500m \
  --set configConnectionPooler.connection_pooler_default_memory_request=100Mi

# Optionally install the operator UI for visual management
helm install postgres-operator-ui postgres-operator-ui-charts/postgres-operator-ui \
  --namespace postgres-operator

Verify the operator is running:

kubectl get pods -n postgres-operator
# Expected output:
# NAME                                 READY   STATUS    RESTARTS   AGE
# postgres-operator-7f8b9c6d4-x2k9j   1/1     Running   0          2m

AWS EKS Deployment Specifics

Amazon EKS requires specific configuration for optimal PostgreSQL performance:

# EKS-specific Helm values (eks-values.yaml)
configAwsOrGcp:
  aws_region: us-east-1
  enable_ebs_gp3_migration: true
  additional_secret_mount: "aws-iam-token"

configKubernetes:
  enable_pod_antiaffinity: true
  pod_environment_configmap: "postgres-pod-config"
  spilo_privileged: false
  storage_resize_mode: pvc

# Use EBS gp3 StorageClass for better performance
# Create the StorageClass first:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3-postgres
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "6000"
  throughput: "400"
  encrypted: "true"
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

For WAL-G backups on EKS, configure IAM roles for Service Accounts (IRSA):

# Create IAM policy for WAL-G S3 access
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::my-pg-backups",
        "arn:aws:s3:::my-pg-backups/*"
      ]
    }
  ]
}

Azure AKS Deployment

Azure AKS uses Azure Disk for persistent storage and Managed Identity for backup authentication:

# AKS-specific StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: managed-premium-postgres
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS
  cachingMode: ReadOnly
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# WAL-G backup to Azure Blob Storage environment variables
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: default
data:
  AZURE_STORAGE_ACCOUNT: "pgbackupsstorage"
  AZURE_STORAGE_ACCESS_KEY: "" # Use Managed Identity instead
  WALG_AZ_PREFIX: "azure://pg-wal-backups/$(SCOPE)"
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  BACKUP_SCHEDULE: "0 2 * * *"
  BACKUP_NUM_TO_RETAIN: "7"

Google GKE Deployment

Google Kubernetes Engine uses Persistent Disk and Workload Identity for GCS backup access:

# GKE StorageClass for SSD Persistent Disks
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd-postgres
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# GCS backup configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: default
data:
  WALG_GS_PREFIX: "gs://my-pg-backups/$(SCOPE)"
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  GOOGLE_APPLICATION_CREDENTIALS: "/var/secrets/google/key.json"
  BACKUP_SCHEDULE: "0 2 * * *"
  BACKUP_NUM_TO_RETAIN: "7"

Bare Metal k3s/Rancher with Longhorn Storage

# Install k3s on all nodes
# Master node:
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --disable traefik \
  --write-kubeconfig-mode 644

# Worker nodes:
curl -sfL https://get.k3s.io | sh -s - agent \
  --server https://master-ip:6443 \
  --token $(cat /var/lib/rancher/k3s/server/node-token)

# Install Longhorn for persistent storage
helm repo add longhorn https://charts.longhorn.io
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultReplicaCount=3 \
  --set defaultSettings.defaultDataPath=/mnt/longhorn

# Install MetalLB for LoadBalancer services
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.5/config/manifests/metallb-native.yaml

# Configure MetalLB IP pool
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: postgres-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.1.200-192.168.1.210

PostgreSQL Cluster CRD Specification

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-production-cluster
  namespace: databases
  labels:
    team: platform
    environment: production
spec:
  teamId: "platform"
  volume:
    size: 100Gi
    storageClass: ebs-gp3-postgres
  numberOfInstances: 3
  enableConnectionPooler: true
  enableReplicaConnectionPooler: true
  connectionPooler:
    numberOfInstances: 2
    mode: transaction
    schema: pooler
    user: pooler
    resources:
      requests:
        cpu: 500m
        memory: 100Mi
      limits:
        cpu: "1"
        memory: 256Mi
  users:
    app_user:
    - superuser
    - createdb
    readonly_user: []
  databases:
    app_database: app_user
  postgresql:
    version: "16"
    parameters:
      shared_buffers: "2GB"
      max_connections: "200"
      work_mem: "64MB"
      maintenance_work_mem: "512MB"
      effective_cache_size: "6GB"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      wal_buffers: "64MB"
      max_wal_size: "4GB"
      min_wal_size: "1GB"
      checkpoint_completion_target: "0.9"
      default_statistics_target: "100"
      log_statement: "ddl"
      log_min_duration_statement: "1000"
      idle_in_transaction_session_timeout: "600000"
      lock_timeout: "30000"
      statement_timeout: "60000"
  patroni:
    initdb:
      encoding: "UTF8"
      locale: "en_US.UTF-8"
      data-checksums: "true"
    pg_hba:
    - hostssl all all 0.0.0.0/0 md5
    - host    all all 0.0.0.0/0 md5
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    synchronous_mode: false
    synchronous_mode_strict: false
    maximum_lag_on_failover: 33554432
  resources:
    requests:
      cpu: "2"
      memory: 8Gi
    limits:
      cpu: "4"
      memory: 16Gi
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9187"
  tolerations:
  - key: "database"
    operator: "Equal"
    value: "postgres"
    effect: "NoSchedule"
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: workload-type
          operator: In
          values:
          - database
  enableShmVolume: true
  spiloRunAsUser: 101
  spiloRunAsGroup: 103
  spiloFSGroup: 103

Key CRD Fields Explained

numberOfInstances — Total number of pods. The operator automatically designates one as primary and the rest as streaming replicas.
enableConnectionPooler — Deploys PgBouncer sidecar for the primary service, reducing connection overhead.
enableReplicaConnectionPooler — Deploys a separate PgBouncer for the replica service, essential for read-heavy workloads.
postgresql.parameters — Direct PostgreSQL configuration parameters passed to postgresql.conf.
patroni — Configures Patroni behavior including TTL, loop wait, retry timeout, and synchronous replication mode.
volume.storageClass — Maps to the platform-specific StorageClass (EBS gp3 on AWS, Premium SSD on Azure, SSD PD on GCP, Longhorn on k3s).
enableShmVolume — Mounts a tmpfs at /dev/shm for PostgreSQL shared memory, critical for performance.

Connection Pooling with PgBouncer

The Zalando operator natively supports PgBouncer deployment. When you set enableConnectionPooler: true in the CRD, the operator creates:

A PgBouncer Deployment with configurable replica count
A dedicated Service (<cluster-name>-pooler) for pooled connections
Automatic credential synchronization with PostgreSQL

PgBouncer Configuration Modes

# PgBouncer connection pooler modes:
#
# transaction (recommended for most workloads)
#   - Connection returned to pool after each transaction
#   - Best balance of efficiency and compatibility
#   - Cannot use session-level features (prepared statements, temp tables)
#
# session
#   - Connection held for entire client session
#   - Full PostgreSQL compatibility
#   - Lower pooling efficiency
#
# statement
#   - Connection returned after each statement
#   - Most efficient but most restrictive
#   - Only works with autocommit queries

# Custom PgBouncer configuration via operator
spec:
  connectionPooler:
    numberOfInstances: 3
    mode: transaction
    schema: pooler
    user: pooler
    defaultPoolSize: 25
    maxDBConnections: 100
    resources:
      requests:
        cpu: 250m
        memory: 128Mi
      limits:
        cpu: "1"
        memory: 256Mi

Multi-Region PostgreSQL Replication

Streaming Replication Configuration

Synchronous replication — The primary waits for at least one replica to confirm WAL receipt before committing a transaction. This guarantees zero data loss (RPO=0) but adds latency. Enable with patroni.synchronous_mode: true.
Asynchronous replication — The primary commits immediately and ships WAL asynchronously. Slightly lower latency but potential data loss during failover. This is the default.
Cascading replication — Replicas can replicate from other replicas instead of the primary, reducing load on the primary in large clusters.

# Enable synchronous replication for zero data loss
spec:
  patroni:
    synchronous_mode: true
    synchronous_mode_strict: false  # Allow async if no sync replica available
    synchronous_node_count: 1       # Number of sync replicas required
  postgresql:
    parameters:
      synchronous_commit: "on"      # Matches Patroni synchronous_mode
      max_wal_senders: "10"         # Maximum WAL sender processes
      wal_keep_size: "1GB"          # WAL retention for replica catch-up
      hot_standby: "on"             # Allow queries on replicas
      hot_standby_feedback: "on"    # Reduce query conflicts on replicas

Backup and Recovery with WAL-G

WAL-G Backup Configuration

# ConfigMap for WAL-G backup configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: databases
data:
  # S3 backup configuration
  AWS_ENDPOINT: "https://s3.us-east-1.amazonaws.com"
  AWS_S3_FORCE_PATH_STYLE: "false"
  AWS_REGION: "us-east-1"
  WALG_S3_PREFIX: "s3://my-pg-backups/$(SCOPE)"
  WALG_DISABLE_S3_SSE: "false"
  WALG_S3_SSE: "aws:kms"
  WALG_S3_SSE_KMS_ID: "arn:aws:kms:us-east-1:123456789:key/mrk-abcdef"

  # Backup scheduling and retention
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  BACKUP_SCHEDULE: "0 1 * * *"       # Daily at 1 AM UTC
  BACKUP_NUM_TO_RETAIN: "14"          # Keep 14 daily backups

  # WAL archiving
  WALG_COMPRESSION_METHOD: "zstd"     # Better compression than lz4
  WALG_DELTA_MAX_STEPS: "6"           # Delta backups between full backups
  WALG_UPLOAD_CONCURRENCY: "4"        # Parallel upload streams
  WALG_DOWNLOAD_CONCURRENCY: "4"      # Parallel download for restore
  WALG_UPLOAD_DISK_CONCURRENCY: "4"   # Disk read concurrency

  # Clone configuration
  CLONE_AWS_ENDPOINT: "https://s3.us-east-1.amazonaws.com"
  CLONE_AWS_REGION: "us-east-1"
  CLONE_WALG_S3_PREFIX: "s3://my-pg-backups/$(CLONE_SCOPE)"
  CLONE_METHOD: "CLONE_WITH_WALG"
  CLONE_USE_WALG_RESTORE: "true"

Point-in-Time Recovery (PITR)

# Clone a cluster with PITR
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-restored-cluster
  namespace: databases
spec:
  teamId: "platform"
  volume:
    size: 100Gi
    storageClass: ebs-gp3-postgres
  numberOfInstances: 3
  postgresql:
    version: "16"
  clone:
    cluster: "pg-production-cluster"
    timestamp: "2026-04-11T14:30:00+00:00"  # Restore to this exact moment
    s3_wal_path: "s3://my-pg-backups/wal/16/pg-production-cluster"
    s3_endpoint: "https://s3.us-east-1.amazonaws.com"
    s3_access_key_id: ""      # Use IAM role instead
    s3_secret_access_key: ""  # Use IAM role instead

When this CRD is applied, the operator performs the following steps:

Finds the latest base backup before the target timestamp
Restores the base backup to the new StatefulSet's primary pod
Replays WAL segments up to the specified timestamp
Opens the database for read-write operations
Sets up streaming replication to the replica pods

Standby Cluster for Disaster Recovery

# Standby cluster in a different region
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-standby-eu
  namespace: databases
spec:
  teamId: "platform"
  volume:
    size: 100Gi
    storageClass: managed-premium-postgres
  numberOfInstances: 2
  postgresql:
    version: "16"
  standby:
    standby_host: "pg-production-cluster.databases.svc.cluster.local"
    standby_port: "5432"
    # Alternative: replicate from S3 WAL archive
    # s3_wal_path: "s3://my-pg-backups/wal/16/pg-production-cluster"
  enableConnectionPooler: true
  enableReplicaConnectionPooler: true

To promote a standby cluster to an independent primary (during disaster recovery), simply remove the standby section from the CRD and apply:

# Edit the standby cluster CRD to remove standby section
kubectl patch postgresql pg-standby-eu -n databases --type json \
  -p '[{"op": "remove", "path": "/spec/standby"}]'

# The operator will promote the standby to primary
# Update your application DNS/service mesh to point to the new primary

Monitoring with Prometheus and Grafana

ServiceMonitor for Prometheus

# ServiceMonitor for PostgreSQL metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: postgres-monitor
  namespace: databases
  labels:
    team: platform
    release: prometheus
spec:
  selector:
    matchLabels:
      team: platform
  namespaceSelector:
    matchNames:
    - databases
  endpoints:
  - port: exporter
    interval: 15s
    scrapeTimeout: 10s
    path: /metrics
    relabelings:
    - sourceLabels: [__meta_kubernetes_pod_label_spilo_role]
      targetLabel: role
    - sourceLabels: [__meta_kubernetes_pod_label_cluster_name]
      targetLabel: cluster
---
# PodMonitor alternative (scrapes pods directly)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: postgres-pod-monitor
  namespace: databases
spec:
  selector:
    matchLabels:
      application: spilo
  podMetricsEndpoints:
  - port: exporter
    interval: 15s
    path: /metrics

Key Metrics to Monitor

These are the most critical PostgreSQL metrics to track in your Grafana dashboards:

pg_stat_replication_lag — Replication lag in bytes and seconds. Alert if lag exceeds your RPO threshold.
pg_stat_activity_count — Active connections by state. Alert on connection pool exhaustion.
pg_stat_database_tup_fetched/returned/inserted/updated/deleted — Query throughput metrics.
pg_stat_bgwriter_buffers_checkpoint/clean/backend — Buffer management efficiency.
pg_database_size_bytes — Database size growth over time for capacity planning.
pg_locks_count — Lock contention. Alert on excessive waiting locks.
pg_stat_statements_calls/mean_time — Query performance statistics for optimization.
patroni_postgres_running — Patroni health status (1 = running, 0 = down).
patroni_master — Which pod is the current primary (1 = master, 0 = replica).
pg_up — Basic PostgreSQL availability probe.

Alerting Rules

# PrometheusRule for PostgreSQL alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: postgres-alerts
  namespace: databases
spec:
  groups:
  - name: postgresql.rules
    rules:
    - alert: PostgreSQLDown
      expr: pg_up == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "PostgreSQL instance {{ $labels.instance }} is down"
    - alert: PostgreSQLReplicationLag
      expr: pg_stat_replication_pg_wal_lsn_diff > 100000000
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Replication lag is {{ $value }} bytes on {{ $labels.instance }}"
    - alert: PostgreSQLHighConnections
      expr: sum(pg_stat_activity_count) by (instance) > 180
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "{{ $value }} active connections on {{ $labels.instance }}"
    - alert: PostgreSQLDeadlocks
      expr: rate(pg_stat_database_deadlocks[5m]) > 0
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "Deadlocks detected on {{ $labels.datname }}"
    - alert: PatroniFailover
      expr: changes(patroni_master[5m]) > 0
      labels:
        severity: critical
      annotations:
        summary: "Patroni failover occurred in cluster {{ $labels.cluster }}"

Production Tuning Guide

Proper tuning is essential to extract maximum performance from PostgreSQL on Kubernetes. The following parameters should be adjusted based on your pod resource limits and workload characteristics.

Memory Configuration

# For a pod with 16GB memory limit:
postgresql:
  parameters:
    # shared_buffers: 25% of total memory
    shared_buffers: "4GB"

    # effective_cache_size: 75% of total memory
    # (tells planner how much OS cache to expect)
    effective_cache_size: "12GB"

    # work_mem: shared_buffers / (max_connections * 2)
    # Conservative to prevent OOM
    work_mem: "10MB"

    # maintenance_work_mem: 5-10% of total memory
    # Used for VACUUM, CREATE INDEX, ALTER TABLE
    maintenance_work_mem: "1GB"

    # temp_buffers: memory for temp tables per session
    temp_buffers: "32MB"

    # huge_pages: try to use huge pages (requires OS config)
    huge_pages: "try"

WAL and Checkpoint Configuration

postgresql:
  parameters:
    # WAL settings
    wal_buffers: "64MB"           # 1/32 of shared_buffers, max 64MB
    wal_compression: "zstd"        # Compress WAL (PG 15+)
    max_wal_size: "8GB"            # Before forced checkpoint
    min_wal_size: "2GB"            # WAL disk reservation
    wal_level: "replica"           # Required for replication

    # Checkpoint settings
    checkpoint_completion_target: "0.9"  # Spread I/O over 90% of interval
    checkpoint_timeout: "15min"          # Max time between checkpoints

Query Planner and I/O

postgresql:
  parameters:
    # Cost parameters for SSD storage
    random_page_cost: "1.1"          # SSD: close to seq_page_cost
    seq_page_cost: "1.0"             # Sequential I/O baseline
    effective_io_concurrency: "200"   # Concurrent I/O for SSD

    # Planner behavior
    default_statistics_target: "200"  # More accurate statistics
    from_collapse_limit: 12           # JOIN planning threshold
    join_collapse_limit: 12           # JOIN planning threshold

    # Parallel queries
    max_parallel_workers_per_gather: "4"
    max_parallel_workers: "8"
    max_parallel_maintenance_workers: "4"
    parallel_tuple_cost: "0.01"
    parallel_setup_cost: "1000"

Connection and Logging

postgresql:
  parameters:
    # Connection limits
    max_connections: "200"                   # Keep low, use PgBouncer
    superuser_reserved_connections: "5"       # Reserve for admin access

    # Logging for troubleshooting
    log_statement: "ddl"                     # Log DDL statements
    log_min_duration_statement: "500"         # Log queries > 500ms
    log_checkpoints: "on"                    # Log checkpoint activity
    log_connections: "off"                   # Too noisy in production
    log_disconnections: "off"                # Too noisy in production
    log_lock_waits: "on"                     # Log lock waits
    log_temp_files: "0"                      # Log all temp file usage
    log_autovacuum_min_duration: "1000"       # Log slow autovacuum

    # Statement timeouts
    statement_timeout: "60000"               # 60 second query timeout
    lock_timeout: "30000"                    # 30 second lock timeout
    idle_in_transaction_session_timeout: "600000"  # 10 min idle txn timeout

Rolling Updates and Version Upgrades

Minor Version Upgrades

Minor version upgrades (e.g., 16.2 to 16.3) are handled automatically by the operator when you update the Spilo image tag. The operator performs a rolling restart:

# Update the operator configuration to use a new Spilo image
helm upgrade postgres-operator postgres-operator-charts/postgres-operator \
  --namespace postgres-operator \
  --set configGeneral.docker_image=ghcr.io/zalando/spilo-16:3.1-p1 \
  --reuse-values

# The operator will perform rolling updates:
# 1. Restart replicas one at a time
# 2. Failover the primary to a freshly updated replica
# 3. Restart the old primary (now a replica)

Major Version Upgrades

Major version upgrades (e.g., PostgreSQL 15 to 16) require more careful planning. The Zalando operator supports in-place major upgrades using pg_upgrade:

# Step 1: Update the CRD to the new major version
spec:
  postgresql:
    version: "16"  # Changed from "15"

# Step 2: The operator detects the version change and:
# a) Scales down the StatefulSet to 1 replica
# b) Runs pg_upgrade on the primary pod
# c) Scales back up to the desired numberOfInstances
# d) Replicas are rebuilt from the upgraded primary

# Step 3: Verify the upgrade
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "psql -c 'SELECT version()'"

# Step 4: Run ANALYZE to update statistics after upgrade
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "psql -c 'ANALYZE VERBOSE'"

Important considerations for major upgrades:

Always take a fresh backup before upgrading
Test the upgrade on a cloned cluster first
Major upgrades require downtime (typically 5-30 minutes depending on database size)
Review the PostgreSQL release notes for breaking changes
Monitor replication lag closely after replica rebuild
Run ANALYZE on all databases to regenerate query planner statistics

Advanced Operational Patterns

Logical Backups

In addition to physical WAL-G backups, the operator supports logical backups using pg_dump. Logical backups are useful for cross-version migrations and selective table restores:

# Enable logical backups in the operator configuration
helm upgrade postgres-operator postgres-operator-charts/postgres-operator \
  --namespace postgres-operator \
  --set configLogicalBackup.logical_backup_schedule="0 3 * * *" \
  --set configLogicalBackup.logical_backup_s3_bucket="my-pg-logical-backups" \
  --set configLogicalBackup.logical_backup_s3_region="us-east-1" \
  --set configLogicalBackup.logical_backup_s3_sse="AES256" \
  --reuse-values

# The operator creates a CronJob for each cluster that:
# 1. Connects to the primary PostgreSQL instance
# 2. Runs pg_dumpall (or pg_dump per database)
# 3. Compresses and uploads to S3

Custom Pod Environment Variables

You can inject environment variables into Spilo pods using a ConfigMap. This is useful for configuring WAL-G, custom scripts, or tuning OS-level parameters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: databases
data:
  # Custom Spilo configurations
  SPILO_CONFIGURATION: |
    bootstrap:
      dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 33554432
        postgresql:
          use_pg_rewind: true
          use_slots: true
          parameters:
            archive_mode: "on"
            archive_timeout: 1800s
  # Enable pg_stat_statements
  POSTGRESQL_SHARED_PRELOAD_LIBRARIES: "bg_mon,pg_stat_statements,pgextwlist,pg_auth_mon,set_user,timescaledb,pg_cron,pg_stat_kcache"
  # Cron jobs inside PostgreSQL
  ENABLE_PG_CRON: "true"

Network Policies for Security

In production, restrict network access to PostgreSQL pods using Kubernetes NetworkPolicies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: postgres-network-policy
  namespace: databases
spec:
  podSelector:
    matchLabels:
      application: spilo
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: app-namespace
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 5432
    - protocol: TCP
      port: 8008   # Patroni REST API
  - from:
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 9187   # Prometheus exporter
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
    ports:
    - protocol: TCP
      port: 443    # S3/GCS/Azure for backups

Troubleshooting Common Issues

Even with automation, issues arise. Here are the most common problems and their solutions when running PostgreSQL with the Zalando operator:

1. Pod Stuck in Pending State

# Check events for the pod
kubectl describe pod pg-production-cluster-0 -n databases

# Common causes:
# - No nodes with matching tolerations/affinity
# - Insufficient CPU or memory on nodes
# - PVC cannot be provisioned (check StorageClass)

# Fix: Check node resources and storage availability
kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU:.status.allocatable.cpu,MEM:.status.allocatable.memory
kubectl get pvc -n databases

2. Replication Lag Growing

# Check replication status on the primary
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "psql -c 'SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag FROM pg_stat_replication'"

# Common causes:
# - Replica CPU/IO saturation
# - Network bandwidth limits
# - Long-running queries on replica blocking WAL replay
# - Insufficient wal_keep_size

# Fix: Check replica resources and cancel blocking queries
kubectl exec -it pg-production-cluster-1 -n databases -- \
  su postgres -c "psql -c 'SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE state = \"active\" AND query_start < now() - interval \"5 minutes\"'"

3. Failover Not Triggering

# Check Patroni cluster status
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "patronictl list"

# Check Patroni logs
kubectl logs pg-production-cluster-0 -n databases -c postgres | grep -i patroni

# Manual failover (if auto-failover is stuck)
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "patronictl failover --candidate pg-production-cluster-1 --force"

4. Backup Failures

# Check WAL-G backup status
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "envdir /run/etc/wal-e.d/env wal-g backup-list"

# Check backup CronJob logs
kubectl logs -l application=spilo,cluster-name=pg-production-cluster -n databases | grep -i wal-g

# Verify S3/GCS credentials
kubectl exec -it pg-production-cluster-0 -n databases -- \
  su postgres -c "envdir /run/etc/wal-e.d/env aws s3 ls s3://my-pg-backups/"

Security Best Practices

Securing PostgreSQL on Kubernetes requires a defense-in-depth approach:

TLS encryption — Enable SSL for all client connections. The operator can automatically provision certificates using cert-manager.
Secret management — Use Kubernetes Secrets (or external secret managers like Vault) for database credentials. Never store passwords in ConfigMaps.
RBAC — Limit operator ServiceAccount permissions. Use least-privilege access for application database users.
NetworkPolicies — Restrict pod-to-pod communication as shown in the previous section.
pg_hba.conf — Configure host-based authentication to restrict which IPs and users can connect.
Audit logging — Enable pgaudit extension for SQL audit logging in regulated environments.
Encrypted storage — Use encrypted StorageClasses (EBS encryption, Azure Disk encryption, etc.).
Pod Security Standards — Run Spilo pods as non-root with restricted security contexts.

# Security-hardened CRD settings
spec:
  spiloRunAsUser: 101
  spiloRunAsGroup: 103
  spiloFSGroup: 103
  enableShmVolume: true
  patroni:
    pg_hba:
    - hostssl all all 0.0.0.0/0 md5
    - host    replication standby all md5
    - hostssl replication standby all md5
  postgresql:
    parameters:
      ssl: "on"
      ssl_min_protocol_version: "TLSv1.3"
      password_encryption: "scram-sha-256"
  additionalVolumes:
  - name: postgres-tls
    mountPath: /tls
    secret:
      secretName: pg-tls-cert
      defaultMode: 0640

Capacity Planning and Sizing

Proper sizing ensures stable performance and cost efficiency. Use these guidelines as a starting point and adjust based on your workload monitoring data:

Workload	CPU	Memory	Storage	Instances
Development	500m	1Gi	10Gi	1
Small Production	2 cores	8Gi	50Gi SSD	3
Medium Production	4 cores	16Gi	200Gi SSD	3
Large Production	8 cores	32Gi	500Gi SSD	5
Enterprise / Analytics	16+ cores	64Gi+	1Ti+ SSD	5+

Complete End-to-End Deployment Example

Let us put everything together with a complete deployment from scratch on a fresh Kubernetes cluster:

# Step 1: Create namespace and configure storage
kubectl create namespace databases
kubectl create namespace postgres-operator

# Step 2: Install the Zalando Postgres Operator
helm repo add postgres-operator-charts \
  https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm repo update

helm install postgres-operator postgres-operator-charts/postgres-operator \
  --namespace postgres-operator \
  --set configKubernetes.enable_pod_antiaffinity=true \
  --set configKubernetes.pod_environment_configmap=databases/postgres-pod-config

# Step 3: Create backup configuration
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
  namespace: databases
data:
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  WALG_S3_PREFIX: "s3://my-pg-backups/\$(SCOPE)"
  BACKUP_SCHEDULE: "0 1 * * *"
  BACKUP_NUM_TO_RETAIN: "14"
EOF

# Step 4: Deploy the PostgreSQL cluster
kubectl apply -f - <<EOF
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-app-cluster
  namespace: databases
  labels:
    team: platform
spec:
  teamId: "platform"
  volume:
    size: 50Gi
  numberOfInstances: 3
  enableConnectionPooler: true
  enableReplicaConnectionPooler: true
  users:
    app_user:
    - superuser
    - createdb
  databases:
    app_db: app_user
  postgresql:
    version: "16"
    parameters:
      shared_buffers: "2GB"
      work_mem: "64MB"
      effective_cache_size: "6GB"
  resources:
    requests:
      cpu: "2"
      memory: 8Gi
    limits:
      cpu: "4"
      memory: 16Gi
EOF

# Step 5: Wait for cluster readiness
kubectl wait --for=condition=Running postgresql/pg-app-cluster \
  -n databases --timeout=300s

# Step 6: Verify cluster status
kubectl get postgresql -n databases
kubectl get pods -n databases -l cluster-name=pg-app-cluster
kubectl get svc -n databases -l cluster-name=pg-app-cluster

# Step 7: Get connection credentials
export PGPASSWORD=$(kubectl get secret app-user.pg-app-cluster.credentials.postgresql.acid.zalan.do \
  -n databases -o jsonpath='{.data.password}' | base64 -d)

# Step 8: Connect and verify
kubectl run pg-client --rm -it --image=postgres:16 -n databases -- \
  psql -h pg-app-cluster-pooler -U app_user -d app_db -c "SELECT version();"

Conclusion

The key takeaways from this guide are:

Automate everything — Let the operator handle StatefulSet management, failover, and backup scheduling. Manual intervention should be the exception.
Monitor aggressively — Deploy Prometheus and Grafana from day one. Replication lag, connection counts, and lock contention are your early warning signals.
Plan for disaster — Configure WAL-G backups, test PITR regularly, and maintain a standby cluster for critical workloads.
Tune for your workload — Default PostgreSQL parameters are conservative. Adjust shared_buffers, work_mem, and checkpoint settings based on your resource allocation and query patterns.
Secure by default — Enable TLS, use SCRAM-SHA-256 authentication, restrict network access, and encrypt storage at rest.
Test upgrades — Always clone your cluster and test major version upgrades before applying them to production.

PostgreSQL High Availability with Zalando Postgres Operator: Multi-Cloud Kubernetes Deployment Guide

Introduction: Why PostgreSQL High Availability on Kubernetes Matters

Zalando Postgres Operator Architecture

Spilo: The PostgreSQL Container Image

Patroni: Leader Election and Automatic Failover

WAL-G: Continuous Archiving and Backup

Kubernetes Resource Hierarchy

Resources Created by the Operator

Installation on Multiple Kubernetes Platforms

Prerequisites

Operator Installation via Helm

AWS EKS Deployment Specifics

Azure AKS Deployment

Google GKE Deployment

Bare Metal k3s/Rancher with Longhorn Storage

PostgreSQL Cluster CRD Specification

Key CRD Fields Explained

Connection Pooling with PgBouncer

PgBouncer Configuration Modes

Multi-Region PostgreSQL Replication

Streaming Replication Configuration

Backup and Recovery with WAL-G

WAL-G Backup Configuration

Point-in-Time Recovery (PITR)

Standby Cluster for Disaster Recovery

Monitoring with Prometheus and Grafana

ServiceMonitor for Prometheus

Key Metrics to Monitor

Alerting Rules

Production Tuning Guide

Memory Configuration

WAL and Checkpoint Configuration

Query Planner and I/O

Connection and Logging

Rolling Updates and Version Upgrades

Minor Version Upgrades

Major Version Upgrades

Advanced Operational Patterns

Logical Backups

Custom Pod Environment Variables

Network Policies for Security

Troubleshooting Common Issues

1. Pod Stuck in Pending State

2. Replication Lag Growing

3. Failover Not Triggering

4. Backup Failures

Security Best Practices

Capacity Planning and Sizing

Complete End-to-End Deployment Example

Conclusion

PostgreSQL High Availability with Zalando Postgres Operator: Multi-Cloud Kubernetes Deployment Guide

Introduction: Why PostgreSQL High Availability on Kubernetes Matters

Zalando Postgres Operator Architecture

Spilo: The PostgreSQL Container Image

Patroni: Leader Election and Automatic Failover

WAL-G: Continuous Archiving and Backup

Kubernetes Resource Hierarchy

Resources Created by the Operator

Installation on Multiple Kubernetes Platforms

Prerequisites

Operator Installation via Helm

AWS EKS Deployment Specifics

Azure AKS Deployment

Google GKE Deployment

Bare Metal k3s/Rancher with Longhorn Storage

PostgreSQL Cluster CRD Specification

Key CRD Fields Explained

Connection Pooling with PgBouncer

PgBouncer Configuration Modes

Multi-Region PostgreSQL Replication

Streaming Replication Configuration

Backup and Recovery with WAL-G

WAL-G Backup Configuration

Point-in-Time Recovery (PITR)

Standby Cluster for Disaster Recovery

Monitoring with Prometheus and Grafana

ServiceMonitor for Prometheus

Key Metrics to Monitor

Alerting Rules

Production Tuning Guide