Longhorn Storage for Production k3s Clusters: Dynamic PVC Expansion, Replication, and Disaster Recovery
Production-Grade Distributed Storage with Longhorn on k3s and Rancher
Storage is the hardest problem in Kubernetes. Compute is stateless and fungible — kill a pod, schedule another. Networking has mature CNI plugins and service meshes. But storage? Storage is where state lives, where data persists across restarts, where a single misconfiguration can mean permanent data loss. For k3s clusters running production workloads — databases, message queues, application state — you need a storage solution that is distributed, resilient, expandable, and operable. Longhorn is that solution.
Longhorn is a lightweight, reliable, and easy-to-use distributed block storage system for Kubernetes. Originally developed by Rancher Labs (now part of SUSE), it is a CNCF incubating project purpose-built for clusters where simplicity matters but production reliability is non-negotiable. Unlike Ceph, which demands dedicated storage nodes and deep expertise, or local-path provisioner, which offers zero redundancy, Longhorn strikes the exact balance k3s clusters need: distributed replication across nodes, dynamic volume expansion, integrated backup to cloud object storage, snapshot and disaster recovery — all managed through a clean UI and Kubernetes-native CRDs.
This guide covers everything you need to deploy Longhorn on k3s for production: architecture internals, installation methods, StorageClass configuration, dynamic PVC expansion, replication strategies, backup and disaster recovery, volume encryption, performance tuning for databases, monitoring, and troubleshooting. Every recommendation comes with production-tested configuration you can adapt to your environment.
Longhorn Architecture
Understanding Longhorn's architecture is essential for making informed decisions about replication, performance, and failure handling. Longhorn is composed of three core components that work together to provide distributed block storage on top of the local disks attached to your Kubernetes nodes.
The Longhorn Manager runs as a DaemonSet on every node in the cluster. It is the control plane of Longhorn — it handles API calls, orchestrates volume creation, manages replication, coordinates snapshots and backups, and communicates with the Kubernetes API server to manage PersistentVolume and PersistentVolumeClaim lifecycle. When you create a PVC that references a Longhorn StorageClass, the Longhorn Manager receives the request through the CSI driver, provisions the volume, and schedules its replicas across available nodes.
The Longhorn Engine is a per-volume storage controller implemented as a Linux userspace process (based on a fork of the Rancher Longhorn Engine). Each volume gets its own dedicated engine process running on the node where the volume is attached. The engine handles all read and write I/O for that volume, replicating writes synchronously to all configured replicas before acknowledging the write to the application. This per-volume architecture means that a crash or hang in one volume's engine does not affect any other volume — a critical isolation property for production.
Replicas are the actual data storage processes. Each replica stores a complete copy of the volume data on the local disk of the node where it runs. By default, Longhorn creates three replicas for each volume, distributed across different nodes (and optionally different zones). Replicas use a copy-on-write mechanism for snapshots, making snapshot creation instantaneous regardless of volume size.
This architecture delivers several key properties for production use. Fault tolerance: with three replicas across three nodes, the volume survives two simultaneous node failures. Isolation: each volume has its own engine process, so a bug or hang in one volume cannot cascade. Simplicity: no dedicated storage nodes, no separate Ceph or GlusterFS clusters — Longhorn runs on the same worker nodes as your application pods, using their local disks. Kubernetes-native: everything is managed through CRDs, kubectl, and the Kubernetes CSI interface.
Installing Longhorn on k3s
Longhorn can be installed on k3s through three methods: Helm chart (recommended for production), Rancher App Marketplace (if you have Rancher managing your cluster), or direct kubectl apply. Before installing, ensure your nodes meet the prerequisites.
Prerequisites
# Verify prerequisites on each node
# Required: open-iscsi (for iSCSI support)
sudo apt-get install -y open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid
# Required: NFSv4 client (for backup/restore to NFS targets)
sudo apt-get install -y nfs-common
# Recommended: check all prerequisites with Longhorn's environment check script
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.7.2/scripts/environment_check.sh | bash
# Verify kernel modules
lsmod | grep -E "iscsi_tcp|dm_crypt|nfs"
# On k3s specifically, local-path-provisioner is the default.
# We will make Longhorn the default StorageClass after installation.
Installation via Helm (Recommended)
# Add the Longhorn Helm repository
helm repo add longhorn https://charts.longhorn.io
helm repo update
# Create the namespace
kubectl create namespace longhorn-system
# Install with production values
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--version 1.7.2 \
--values longhorn-values.yaml
Here is a production-grade values.yaml with tuned defaults:
# longhorn-values.yaml — Production configuration
persistence:
defaultClass: true
defaultFsType: ext4
defaultClassReplicaCount: 3
defaultDataLocality: best-effort
reclaimPolicy: Retain
defaultSettings:
backupTarget: s3://longhorn-backups@eu-west-1/
backupTargetCredentialSecret: longhorn-backup-s3-secret
createDefaultDiskLabeledNodes: true
defaultDataPath: /var/lib/longhorn/
defaultReplicaCount: 3
defaultDataLocality: best-effort
replicaSoftAntiAffinity: false
replicaAutoBalance: best-effort
storageOverProvisioningPercentage: 150
storageMinimalAvailablePercentage: 15
guaranteedInstanceManagerCPU: 12
upgradeChecker: false
autoSalvage: true
autoDeletePodWhenVolumeDetachedUnexpectedly: true
disableSchedulingOnCordonedNode: true
replicaZoneSoftAntiAffinity: true
volumeAttachmentRecoveryPolicy: wait
snapshotDataIntegrity: fast-check
snapshotDataIntegrityCronjob: "0 7 * * *"
concurrentAutomaticEngineUpgradePerNodeLimit: 1
longhornManager:
priorityClass: system-cluster-critical
tolerations:
- key: "node-role.kubernetes.io/storage"
operator: "Exists"
effect: "NoSchedule"
longhornDriver:
priorityClass: system-cluster-critical
tolerations:
- key: "node-role.kubernetes.io/storage"
operator: "Exists"
effect: "NoSchedule"
longhornUI:
replicas: 2
ingress:
enabled: true
ingressClassName: nginx
host: longhorn.internal.example.com
tls: true
tlsSecret: longhorn-tls
annotations:
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: longhorn-basic-auth
nginx.ingress.kubernetes.io/auth-realm: "Longhorn UI"
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
Installation via Rancher App Marketplace
If Rancher manages your k3s cluster, navigate to Apps & Marketplace → Charts → Longhorn in the Rancher UI. Select your target namespace (longhorn-system), configure the values through the form interface, and click Install. Rancher handles Helm lifecycle management and upgrade tracking automatically.
Installation via kubectl
# Direct manifest installation
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.7.2/deploy/longhorn.yaml
# Verify all pods are running
kubectl -n longhorn-system get pods -w
Post-Installation: Make Longhorn the Default StorageClass
# Remove default annotation from k3s local-path
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
# Verify Longhorn is now default
kubectl get storageclass
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
# longhorn (default) driver.longhorn.io Retain Immediate true 5m
# local-path rancher.io/local-path Delete WaitForFirstConsumer false 30d
StorageClass Configuration for Production
The default Longhorn StorageClass works for development, but production workloads need specific configurations for different use cases — databases need high replication and specific data locality, temporary processing needs fast single-replica volumes, and shared volumes need RWX support.
# StorageClass for database workloads — maximum durability
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-db
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: ext4
dataLocality: best-effort
recurringJobSelector: '[{"name":"db-snapshot","isGroup":true},{"name":"db-backup","isGroup":true}]'
---
# StorageClass for general workloads — balanced
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-standard
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "2"
staleReplicaTimeout: "2880"
fsType: ext4
dataLocality: best-effort
---
# StorageClass for temporary/cache workloads — performance
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "1"
staleReplicaTimeout: "2880"
fsType: ext4
dataLocality: strict-local
---
# StorageClass for RWX (ReadWriteMany) shared volumes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-rwx
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
nfsOptions: "vers=4.1,hard,timeo=50,retrans=3"
staleReplicaTimeout: "2880"
fsType: ext4
Dynamic PVC Expansion
One of Longhorn's most important production features is dynamic PVC expansion — the ability to grow a persistent volume's size without downtime, without data loss, and without manual intervention beyond a single kubectl command or manifest change. This is critical for databases where data growth is unpredictable and running out of disk space means an outage.
Longhorn supports both online expansion (volume stays attached and mounted while it grows) and offline expansion (volume is detached first). Online expansion is the default and recommended approach for production because it avoids application downtime.
Enabling Volume Expansion in StorageClass
The key requirement for dynamic PVC expansion is that the StorageClass must have allowVolumeExpansion: true. The Longhorn default StorageClass already includes this, but if you have custom StorageClasses, verify this field is set.
# Verify your StorageClass supports expansion
kubectl get storageclass longhorn -o yaml | grep allowVolumeExpansion
# allowVolumeExpansion: true
# If not set, patch it
kubectl patch storageclass longhorn -p '{"allowVolumeExpansion": true}'
Step-by-Step: Expanding a PVC Dynamically
# 1. Check current PVC size
kubectl get pvc pg-data-postgresql-0 -n database
# NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
# pg-data-postgresql-0 Bound pvc-abc123 50Gi RWO longhorn-db 30d
# 2. Check current usage inside the pod
kubectl exec -n database postgresql-0 -- df -h /var/lib/postgresql/data
# Filesystem Size Used Avail Use% Mounted on
# /dev/longhorn 49G 42G 7.0G 86% /var/lib/postgresql/data
# 3. Expand the PVC (online — no downtime)
kubectl patch pvc pg-data-postgresql-0 -n database --type merge -p '{
"spec": {
"resources": {
"requests": {
"storage": "100Gi"
}
}
}
}'
# 4. Watch the expansion progress
kubectl get pvc pg-data-postgresql-0 -n database -w
# Wait for CAPACITY to update from 50Gi to 100Gi
# 5. Verify the filesystem expanded inside the pod
kubectl exec -n database postgresql-0 -- df -h /var/lib/postgresql/data
# Filesystem Size Used Avail Use% Mounted on
# /dev/longhorn 99G 42G 57G 43% /var/lib/postgresql/data
# 6. Verify via Longhorn volume status
kubectl -n longhorn-system get volumes.longhorn.io -o wide
Automating PVC Expansion with Alerts
In production, you should not wait until a disk is 86% full to expand it manually. Use Prometheus alerts to trigger expansion automatically or alert the on-call engineer before capacity becomes critical.
# PrometheusRule for PVC capacity alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: pvc-capacity-alerts
namespace: monitoring
spec:
groups:
- name: pvc-capacity
rules:
- alert: PVCCapacityWarning
expr: |
(kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.80
for: 10m
labels:
severity: warning
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} at {{ $value | humanizePercentage }} capacity"
runbook: "Expand the PVC using: kubectl patch pvc {{ $labels.persistentvolumeclaim }} -n {{ $labels.namespace }} --type merge -p '{\"spec\":{\"resources\":{\"requests\":{\"storage\":\"NEW_SIZE\"}}}}'"
- alert: PVCCapacityCritical
expr: |
(kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.90
for: 5m
labels:
severity: critical
annotations:
summary: "CRITICAL: PVC {{ $labels.persistentvolumeclaim }} at {{ $value | humanizePercentage }}"
description: "Immediate expansion required to prevent application failure"
Volume Replication and Data Locality
Longhorn replicates volume data across multiple nodes to protect against hardware failures. The replication factor and data locality settings control the trade-off between durability, performance, and storage efficiency.
Replication factor determines how many copies of the data exist. The default is 3, meaning each write is stored on three different nodes. For production databases, 3 is the minimum recommended value. You can set it to 2 for less critical workloads to save storage, or leave it at 1 for temporary/cache volumes where data loss is acceptable.
Data locality controls whether Longhorn tries to keep a replica on the same node as the pod consuming the volume. There are three modes:
- disabled — Replicas are scheduled purely based on available space and anti-affinity. The pod might read from a replica on a remote node, adding network latency to every I/O operation.
- best-effort — Longhorn tries to place one replica on the same node as the consuming pod. If the local node runs out of space or the pod migrates, the volume still works but may have slightly higher latency. This is the recommended setting for most workloads.
- strict-local — The volume can only be used on a node that has a local replica. If the pod is scheduled to a node without a local replica, the volume attachment fails. Use this only for latency-sensitive single-replica workloads where you accept the durability trade-off.
# Change replication factor on an existing volume
kubectl -n longhorn-system patch volumes.longhorn.io pvc-abc123 \
--type merge -p '{"spec":{"numberOfReplicas":3}}'
# Set data locality on existing volume
kubectl -n longhorn-system patch volumes.longhorn.io pvc-abc123 \
--type merge -p '{"spec":{"dataLocality":"best-effort"}}'
# Replica auto-balancing (spreads replicas evenly across nodes)
# Set via Longhorn settings
kubectl -n longhorn-system edit settings.longhorn.io replica-auto-balance
# Set value to: best-effort
Snapshots and Backups
Longhorn provides two distinct data protection mechanisms: snapshots (local, instant, for quick rollback) and backups (remote, to object storage, for disaster recovery). Understanding when to use each is critical.
Snapshots are stored locally on the same disks as the volume replicas. They are created instantaneously using copy-on-write — no data is copied at snapshot time, only new writes after the snapshot allocate additional space. Snapshots are excellent for quick rollback before a risky migration or deployment, but they do not protect against node or disk failure because they live on the same storage as the volume.
Backups copy the volume data to an external backup target — S3, GCS, Azure Blob, or any S3-compatible store (MinIO, Wasabi). Backups are incremental at the block level: only blocks that changed since the last backup are transferred. This makes recurring backups fast and storage-efficient. Backups protect against total cluster loss because they exist independently.
Configuring the Backup Target
# Create the S3 credentials secret
kubectl create secret generic longhorn-backup-s3-secret \
-n longhorn-system \
--from-literal=AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE \
--from-literal=AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
--from-literal=AWS_ENDPOINTS=https://s3.eu-west-1.amazonaws.com
# Set the backup target in Longhorn settings
kubectl -n longhorn-system edit settings.longhorn.io backup-target
# Set value to: s3://longhorn-backups@eu-west-1/
kubectl -n longhorn-system edit settings.longhorn.io backup-target-credential-secret
# Set value to: longhorn-backup-s3-secret
# For GCS backup target
# value: s3://longhorn-backups@us/ (GCS is S3-compatible via interop)
# Or use the GCS JSON key:
kubectl create secret generic longhorn-backup-gcs-secret \
-n longhorn-system \
--from-literal=GOOGLE_APPLICATION_CREDENTIALS=/var/longhorn-backup/gcs-key.json \
--from-file=gcs-key.json=./service-account-key.json
# For Azure Blob backup target
kubectl create secret generic longhorn-backup-azure-secret \
-n longhorn-system \
--from-literal=AZBLOB_ACCOUNT_NAME=prodbackupstorage \
--from-literal=AZBLOB_ACCOUNT_KEY=base64encodedkeyhere
VolumeSnapshot Class and Snapshot YAML
# VolumeSnapshotClass for Longhorn
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
type: snap
---
# Create a VolumeSnapshot
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: pg-data-snap-before-migration
namespace: database
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
persistentVolumeClaimName: pg-data-postgresql-0
---
# Restore a PVC from a snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pg-data-restored
namespace: database
spec:
storageClassName: longhorn-db
dataSource:
name: pg-data-snap-before-migration
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
Recurring Backup Schedules
# Recurring snapshot job — every 4 hours, retain 6
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: db-snapshot-4h
namespace: longhorn-system
spec:
cron: "0 */4 * * *"
task: snapshot
retain: 6
concurrency: 2
groups:
- db-volumes
labels:
tier: database
---
# Recurring backup job — daily at 02:00, retain 14 days
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: db-backup-daily
namespace: longhorn-system
spec:
cron: "0 2 * * *"
task: backup
retain: 14
concurrency: 2
groups:
- db-volumes
labels:
tier: database
---
# Recurring backup job — weekly full, retain 8 weeks
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: db-backup-weekly
namespace: longhorn-system
spec:
cron: "0 3 * * 0"
task: backup
retain: 8
concurrency: 1
groups:
- db-volumes
labels:
tier: database
---
# Assign recurring jobs to a volume via labels
# Label the PVC or volume with: recurring-job-group.longhorn.io/db-volumes: enabled
kubectl label pvc pg-data-postgresql-0 -n database \
recurring-job-group.longhorn.io/db-volumes=enabled
Disaster Recovery
Longhorn provides a built-in disaster recovery mechanism through DR volumes — standby volumes in a secondary cluster that continuously pull incremental backups from the primary cluster's backup target. When disaster strikes, you activate the DR volume and it becomes a regular read-write volume, letting the secondary cluster take over.
Setting Up DR Volumes
# On the DR cluster: create a DR volume from the primary's backup
# This is done via the Longhorn UI or API
# 1. Configure the DR cluster's backup target to point to the same S3 bucket
kubectl -n longhorn-system edit settings.longhorn.io backup-target
# value: s3://longhorn-backups@eu-west-1/
# 2. Create a DR volume from the latest backup
# Via Longhorn UI: Backup → Select volume → Create DR Volume
# Via kubectl:
apiVersion: longhorn.io/v1beta2
kind: Volume
metadata:
name: pg-data-dr
namespace: longhorn-system
spec:
size: "107374182400" # 100Gi in bytes
numberOfReplicas: 3
fromBackup: "s3://longhorn-backups@eu-west-1/?backup=backup-abc123&volume=pg-data"
standby: true
frontend: ""
# 3. The DR volume auto-syncs incremental backups from S3
# Monitor sync status:
kubectl -n longhorn-system get volumes.longhorn.io pg-data-dr -o jsonpath='{.status.lastBackup}'
# 4. During failover — activate the DR volume:
kubectl -n longhorn-system patch volumes.longhorn.io pg-data-dr \
--type merge -p '{"spec":{"standby":false,"frontend":"blockdev"}}'
# 5. Create a PVC bound to the activated DR volume
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pg-data-postgresql-0
namespace: database
spec:
storageClassName: longhorn-db
volumeName: pg-data-dr
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
Volume Encryption
Longhorn supports volume-level encryption using Linux LUKS2. Encrypted volumes protect data at rest on the underlying disk — even if someone gains physical access to the server's storage, they cannot read the volume data without the encryption key.
# Create a Kubernetes secret with the encryption passphrase
kubectl create secret generic longhorn-crypto \
-n longhorn-system \
--from-literal=CRYPTO_KEY_VALUE="a-very-long-and-random-passphrase-here" \
--from-literal=CRYPTO_KEY_PROVIDER=secret \
--from-literal=CRYPTO_KEY_CIPHER=aes-xts-plain64 \
--from-literal=CRYPTO_KEY_HASH=sha256 \
--from-literal=CRYPTO_KEY_SIZE=256 \
--from-literal=CRYPTO_PBKDF=argon2i
# StorageClass with encryption enabled
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-encrypted
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
encrypted: "true"
csi.storage.k8s.io/provisioner-secret-name: longhorn-crypto
csi.storage.k8s.io/provisioner-secret-namespace: longhorn-system
csi.storage.k8s.io/node-publish-secret-name: longhorn-crypto
csi.storage.k8s.io/node-publish-secret-namespace: longhorn-system
csi.storage.k8s.io/node-stage-secret-name: longhorn-crypto
csi.storage.k8s.io/node-stage-secret-namespace: longhorn-system
ReadWriteMany (RWX) Support
By default, Longhorn volumes are ReadWriteOnce (RWO) — they can be mounted by a single pod on a single node. For workloads that need shared storage across multiple pods (e.g., shared media uploads, configuration files, ML model artifacts), Longhorn supports ReadWriteMany (RWX) through an integrated NFS server.
When a PVC requests RWX access mode, Longhorn automatically deploys a share-manager pod that runs an NFS server backed by the Longhorn volume. Multiple pods can then mount the volume simultaneously over NFS. This is simpler than deploying a separate NFS server but adds a layer of network overhead compared to direct block access.
# PVC with ReadWriteMany access mode
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-media
namespace: application
spec:
storageClassName: longhorn-rwx
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
Database Workloads on Longhorn
Running databases on Longhorn requires careful attention to StorageClass parameters, pod affinity, and backup integration. Longhorn's per-volume engine and synchronous replication make it well-suited for database workloads, but you need to configure it correctly to get production-grade performance and reliability.
PostgreSQL StatefulSet with Longhorn
# PostgreSQL StatefulSet using Longhorn PVC
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgresql
namespace: database
spec:
serviceName: postgresql
replicas: 3
selector:
matchLabels:
app: postgresql
template:
metadata:
labels:
app: postgresql
spec:
terminationGracePeriodSeconds: 120
securityContext:
fsGroup: 999
runAsUser: 999
containers:
- name: postgresql
image: postgres:16-alpine
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: production
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: pg-credentials
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: pg-credentials
key: password
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "4"
memory: 8Gi
volumeMounts:
- name: pg-data
mountPath: /var/lib/postgresql/data
livenessProbe:
exec:
command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: pg-data
labels:
recurring-job-group.longhorn.io/db-volumes: enabled
spec:
storageClassName: longhorn-db
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
MySQL StatefulSet with Longhorn
# MySQL StatefulSet using Longhorn PVC
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: database
spec:
serviceName: mysql
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
terminationGracePeriodSeconds: 60
containers:
- name: mysql
image: mysql:8.0
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-credentials
key: root-password
- name: MYSQL_DATABASE
value: production
args:
- "--default-authentication-plugin=mysql_native_password"
- "--innodb-buffer-pool-size=4G"
- "--innodb-log-file-size=1G"
- "--innodb-flush-log-at-trx-commit=1"
- "--sync-binlog=1"
resources:
requests:
cpu: "1"
memory: 4Gi
limits:
cpu: "4"
memory: 8Gi
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
volumeClaimTemplates:
- metadata:
name: mysql-data
labels:
recurring-job-group.longhorn.io/db-volumes: enabled
spec:
storageClassName: longhorn-db
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
Performance Tuning for Database Workloads
Longhorn adds a storage replication layer between the application and the physical disk, which introduces some latency compared to direct local disk access. For most workloads this overhead is negligible, but database workloads with heavy write patterns need tuning to achieve optimal performance.
Use data locality: best-effort. This ensures one replica is on the same node as the database pod, meaning reads hit local disk at NVMe/SSD speed. Writes still replicate to remote nodes, but the local replica eliminates network round-trips for reads.
Dedicate disks to Longhorn. Do not share the OS disk with Longhorn data. Add dedicated NVMe or SSD drives and configure them as Longhorn disks. This prevents I/O contention between the operating system and database volumes.
# Configure dedicated disks on a node
# Via kubectl (Longhorn node CRD)
kubectl -n longhorn-system edit nodes.longhorn.io worker-01
# Add the dedicated disk under spec.disks:
spec:
disks:
default-disk:
allowScheduling: false # disable OS disk for Longhorn
path: /var/lib/longhorn/
storageReserved: 0
nvme-data:
allowScheduling: true
path: /mnt/nvme-longhorn/
storageReserved: 10737418240 # 10Gi reserved
tags:
- nvme
- database
Set guaranteed engine manager CPU. Longhorn's engine processes consume CPU for I/O processing and replication. The guaranteedInstanceManagerCPU setting reserves a percentage of node CPU for Longhorn instance managers, preventing CPU starvation under load.
Tune the replica count. For databases that already have application-level replication (PostgreSQL streaming replication, MySQL group replication, MongoDB replica sets), you can reduce the Longhorn replica count to 2 instead of 3. The database's own replication provides an additional layer of data protection, and fewer Longhorn replicas means less write amplification and better write throughput.
Use ext4 over xfs for small random I/O. While xfs excels at large sequential writes, ext4 generally performs better for the small random I/O pattern typical of database workloads. Set fsType: ext4 in your StorageClass.
Node Scheduling and Disk Management
Longhorn provides fine-grained control over which nodes and disks are used for volume scheduling. This is critical in heterogeneous clusters where some nodes have fast NVMe storage and others have slower SATA drives.
# Tag nodes for storage scheduling
kubectl label node worker-01 node.longhorn.io/storage=nvme
kubectl label node worker-02 node.longhorn.io/storage=nvme
kubectl label node worker-03 node.longhorn.io/storage=nvme
kubectl label node worker-04 node.longhorn.io/storage=sata
# Use node selector in StorageClass to target NVMe nodes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-nvme
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
numberOfReplicas: "3"
nodeSelector: "node.longhorn.io/storage=nvme"
diskSelector: "nvme,database"
dataLocality: best-effort
Monitoring with Prometheus and Grafana
Longhorn exposes Prometheus metrics through a built-in metrics endpoint. Monitoring these metrics is essential for capacity planning, performance analysis, and proactive alerting.
# ServiceMonitor for Longhorn metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: longhorn-prometheus
namespace: monitoring
labels:
release: prometheus
spec:
selector:
matchLabels:
app: longhorn-manager
namespaceSelector:
matchNames:
- longhorn-system
endpoints:
- port: manager
path: /metrics
interval: 30s
---
# PrometheusRule for Longhorn alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: longhorn-alerts
namespace: monitoring
spec:
groups:
- name: longhorn-storage
rules:
- alert: LonghornVolumeStatusCritical
expr: longhorn_volume_robustness == 3
for: 5m
labels:
severity: critical
annotations:
summary: "Longhorn volume {{ $labels.volume }} is Faulted"
- alert: LonghornVolumeStatusDegraded
expr: longhorn_volume_robustness == 2
for: 10m
labels:
severity: warning
annotations:
summary: "Longhorn volume {{ $labels.volume }} is Degraded"
- alert: LonghornNodeStorageWarning
expr: |
(longhorn_node_storage_usage_bytes / longhorn_node_storage_capacity_bytes) > 0.80
for: 15m
labels:
severity: warning
annotations:
summary: "Longhorn node {{ $labels.node }} storage at {{ $value | humanizePercentage }}"
- alert: LonghornBackupFailed
expr: longhorn_backup_state == 3
for: 5m
labels:
severity: critical
annotations:
summary: "Longhorn backup failed for volume {{ $labels.volume }}"
Key Grafana dashboard panels to create for Longhorn monitoring include: volume IOPS (read/write), volume throughput (MB/s), volume latency (p50/p95/p99), replica rebuild progress, node storage capacity and usage, backup status and age, and snapshot count per volume.
k3s Cluster with Complete Longhorn Storage Stack
The following diagram shows how Longhorn fits into a complete k3s production stack, from bare metal servers through to application pods, with Rancher management and Prometheus/Grafana observability.
Comparison with Other Kubernetes Storage Solutions
Longhorn is not the only storage option for Kubernetes. Understanding how it compares to alternatives helps you make the right choice for your specific requirements.
| Feature | Longhorn | Rook-Ceph | OpenEBS (Mayastor) | local-path |
|---|---|---|---|---|
| Complexity | Low | High | Medium | Minimal |
| Replication | Built-in (2–3 replicas) | CRUSH algorithm | NVMe-oF replicas | None |
| Dynamic PVC Expansion | Yes (online) | Yes | Yes | No |
| Snapshots | COW snapshots | RBD snapshots | Yes | No |
| Backup to Cloud | Built-in (S3/GCS/Azure) | Via rbd export | Via Velero | No |
| DR Volumes | Native standby volumes | RBD mirroring | No | No |
| RWX Support | Yes (NFS-based) | Yes (CephFS) | No | No |
| Volume Encryption | LUKS2 | dmcrypt | No | No |
| UI | Built-in web UI | Ceph dashboard | Minimal | None |
| Min Nodes | 1 (3 for HA) | 3 (dedicated OSD nodes) | 3 | 1 |
| Resource Overhead | Low-Medium | High | Medium | None |
| Best For | k3s/RKE2 production | Large-scale enterprise | High-performance NVMe | Dev/single-node |
Longhorn vs Rook-Ceph: Ceph is the more powerful system — it supports object storage, file storage, and block storage with a sophisticated CRUSH placement algorithm. However, Ceph demands at least 3 dedicated OSD nodes, significant RAM (minimum 4GB per OSD daemon), and deep operational expertise. For k3s clusters with 3–10 nodes, Longhorn provides 90% of the value at 10% of the operational cost.
Longhorn vs OpenEBS Mayastor: Mayastor uses NVMe-over-Fabrics for high-performance replication, achieving lower latency than Longhorn's TCP-based replication. If raw IOPS and sub-millisecond latency are your primary concern and you have NVMe infrastructure with RDMA networking, Mayastor may be the better choice. For most k3s deployments, Longhorn's integrated backup, DR, and operational simplicity outweigh Mayastor's performance edge.
Longhorn vs local-path: local-path provisioner is k3s's default storage — it simply creates directories on the node's local filesystem. Zero replication, zero snapshots, zero backup integration. It is fine for development but unacceptable for production data.
Production Best Practices
These recommendations are distilled from operating Longhorn across dozens of production k3s clusters running database workloads.
1. Set resource reservations for instance managers. Longhorn instance managers (engine and replica managers) need guaranteed CPU to avoid I/O stalls during node pressure. Set guaranteedInstanceManagerCPU to at least 12% in Longhorn settings.
2. Use Retain reclaim policy for database volumes. Never use Delete for database PVCs. A Retain policy keeps the PV and its data even after the PVC is deleted, giving you a safety net against accidental deletion.
3. Disable replica soft anti-affinity for production. Set replicaSoftAntiAffinity: false to ensure replicas are always spread across different nodes. With soft anti-affinity, Longhorn may schedule multiple replicas on the same node when space is tight — defeating the purpose of replication.
4. Reserve storage space on every node. Set storageMinimalAvailablePercentage to at least 15%. This prevents Longhorn from consuming all disk space, which would cause node-level issues affecting all pods.
5. Enable auto-salvage. The autoSalvage setting automatically recovers volumes that enter a faulted state when at least one replica is still healthy. This reduces manual intervention during node failures.
6. Set priority class to system-cluster-critical. Longhorn's manager and driver components should never be evicted during node pressure. Set their priorityClass to system-cluster-critical to ensure they survive pod eviction.
7. Configure recurring jobs for all production volumes. Every production volume should have both snapshot and backup recurring jobs. Snapshots every 4 hours for quick rollback, backups daily for disaster recovery.
8. Test DR volume activation regularly. Create a monthly schedule to activate DR volumes in your standby cluster, verify data integrity, and practice the failover procedure. A DR plan that has never been tested is just documentation.
9. Monitor volume health proactively. Set up Prometheus alerts for degraded and faulted volumes, node storage capacity, backup age, and replica rebuild status. By the time a user reports a slow database, the storage problem has been building for hours.
10. Use dedicated storage disks. Separate Longhorn data from the OS disk. This prevents I/O contention, gives you cleaner capacity management, and avoids the risk of filling the OS disk with Longhorn data.
Cloud-Specific Deployment Guidance
AWS — EC2 with NVMe Instance Storage
For AWS deployments, use i3.xlarge or i3en.xlarge instances that come with NVMe instance storage. These provide raw NVMe performance at a fraction of the cost of provisioned EBS IOPS. Format the instance storage and configure it as a Longhorn disk.
# On each EC2 i3.xlarge instance
sudo mkfs.ext4 /dev/nvme0n1
sudo mkdir -p /mnt/longhorn-nvme
sudo mount /dev/nvme0n1 /mnt/longhorn-nvme
echo '/dev/nvme0n1 /mnt/longhorn-nvme ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab
# Configure as Longhorn disk (via node CRD or UI)
# Path: /mnt/longhorn-nvme/
Azure — VMs with Ultra Disk
On Azure, use Standard_L8s_v3 or Standard_L16s_v3 VMs with local NVMe storage. Alternatively, attach Ultra Disks for consistent sub-millisecond latency. Ultra Disks allow you to independently configure IOPS and throughput.
# Attach Ultra Disk to Azure VM (Azure CLI)
az vm disk attach \
--vm-name k3s-worker-01 \
--resource-group k3s-cluster \
--name longhorn-ultra-01 \
--size-gb 512 \
--sku UltraSSD_LRS \
--disk-iops-read-write 10000 \
--disk-mbps-read-write 300 \
--new
GCP — VMs with Local SSD
GCP offers Local SSD attached to n2-standard or c3-standard VMs. Local SSDs provide 375 GB per disk with up to 680,000 read IOPS. Attach multiple Local SSDs and RAID them for larger volumes.
# Create VM with local SSDs (gcloud)
gcloud compute instances create k3s-worker-01 \
--machine-type=n2-standard-8 \
--local-ssd=interface=NVME \
--local-ssd=interface=NVME \
--zone=europe-west1-b
# RAID two local SSDs for 750GB Longhorn disk
sudo mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme0n2
sudo mkfs.ext4 /dev/md0
sudo mkdir -p /mnt/longhorn-nvme
sudo mount /dev/md0 /mnt/longhorn-nvme
On-Premises Datacenter
For on-premises deployments, use servers with dedicated NVMe or SAS SSD drives for Longhorn. Separate the OS disk from storage disks. Use a 10GbE or 25GbE network between nodes to ensure replication traffic does not bottleneck — Longhorn synchronous replication generates network traffic proportional to write throughput multiplied by replica count.
# Typical on-premises server spec for Longhorn k3s node
# CPU: 8+ cores (Xeon or EPYC)
# RAM: 32+ GB
# OS Disk: 256GB SATA SSD (mirrored)
# Longhorn Disk: 2x 1TB NVMe (RAID-1 or individual Longhorn disks)
# Network: 2x 25GbE (bonded, one for application, one for storage)
# Configure dedicated storage network (optional but recommended)
# In Longhorn settings:
# storage-network: kube-system/storage-network-attachment
Longhorn UI Walkthrough
Longhorn ships with a built-in web UI that provides a visual interface for managing volumes, snapshots, backups, nodes, and settings. Access it through the Ingress configured during installation or via kubectl port-forward.
# Quick access via port-forward
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80
# Open http://localhost:8080 in your browser
The UI provides several key views: Dashboard shows cluster-wide storage health including total capacity, used space, and volume status. Volume lists all volumes with their state (Healthy/Degraded/Faulted), size, replica count, and attached node. You can expand, snapshot, back up, and restore volumes directly from this view. Node shows each node's storage configuration, disk allocation, and scheduling status. Backup lists all backups stored in the backup target with their volume, size, and creation time. Setting exposes all Longhorn configuration parameters with descriptions.
Troubleshooting Common Issues
Degraded Volumes
A volume enters Degraded state when one or more replicas are unhealthy but the volume is still functional. Common causes include node failure, disk full, or network partition.
# Check volume status
kubectl -n longhorn-system get volumes.longhorn.io -o wide
# Describe the volume for detailed status
kubectl -n longhorn-system describe volumes.longhorn.io pvc-abc123
# Check replica status
kubectl -n longhorn-system get replicas.longhorn.io -l longhornvolume=pvc-abc123
# If a replica is stuck in error state, delete it to trigger rebuild
kubectl -n longhorn-system delete replicas.longhorn.io pvc-abc123-r-12345
# Monitor rebuild progress
kubectl -n longhorn-system get volumes.longhorn.io pvc-abc123 \
-o jsonpath='{.status.conditions}' | jq
Volume Stuck in Attaching/Detaching
# Check engine status
kubectl -n longhorn-system get engines.longhorn.io -l longhornvolume=pvc-abc123
# Check for stuck VolumeAttachment objects
kubectl get volumeattachments | grep pvc-abc123
# Force detach a stuck volume (use cautiously)
kubectl -n longhorn-system patch volumes.longhorn.io pvc-abc123 \
--type merge -p '{"spec":{"nodeID":""}}'
# If attachment is truly stuck, delete the VolumeAttachment
kubectl delete volumeattachment csi-abc123def456
Space Pressure
# Check node storage usage
kubectl -n longhorn-system get nodes.longhorn.io -o wide
# Identify large snapshots consuming space
kubectl -n longhorn-system get snapshots.longhorn.io -l longhornvolume=pvc-abc123
# Purge old snapshots to reclaim space
# Via Longhorn UI: Volume → Snapshots → Delete unneeded snapshots
# Old snapshots consume space because COW data cannot be freed until the snapshot is deleted
# Check for snapshot data integrity issues
kubectl -n longhorn-system get settings.longhorn.io snapshot-data-integrity
Upgrade Procedures
Longhorn upgrades should be performed carefully, as the storage system underpins all stateful workloads. Always take backups of all critical volumes before upgrading.
# 1. Back up all volumes before upgrade
for vol in $(kubectl -n longhorn-system get volumes.longhorn.io -o jsonpath='{.items[*].metadata.name}'); do
echo "Backing up $vol"
kubectl -n longhorn-system patch volumes.longhorn.io $vol \
--type merge -p '{"spec":{"lastBackupAt":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'"}}'
done
# 2. Check current version
kubectl -n longhorn-system get daemonset longhorn-manager -o jsonpath='{.spec.template.spec.containers[0].image}'
# 3. Upgrade via Helm
helm repo update
helm upgrade longhorn longhorn/longhorn \
--namespace longhorn-system \
--version 1.7.2 \
--values longhorn-values.yaml
# 4. Monitor the rolling upgrade
kubectl -n longhorn-system rollout status daemonset/longhorn-manager
kubectl -n longhorn-system rollout status deployment/longhorn-driver-deployer
# 5. Verify all volumes are healthy after upgrade
kubectl -n longhorn-system get volumes.longhorn.io -o custom-columns=NAME:.metadata.name,STATE:.status.state,ROBUSTNESS:.status.robustness
# 6. Longhorn automatically upgrades engines — monitor progress
kubectl -n longhorn-system get engines.longhorn.io -o wide
Integration with Database Operators
Modern Kubernetes database operators (CloudNativePG, Percona Operator, Zalando Postgres Operator) work seamlessly with Longhorn. The operator manages the database lifecycle while Longhorn provides the underlying storage with replication, snapshots, and backup.
# CloudNativePG cluster using Longhorn storage
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: production-pg
namespace: database
spec:
instances: 3
storage:
size: 100Gi
storageClass: longhorn-db
pvcTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
walStorage:
size: 20Gi
storageClass: longhorn-fast
postgresql:
parameters:
shared_buffers: "2GB"
effective_cache_size: "6GB"
maintenance_work_mem: "512MB"
wal_buffers: "64MB"
max_connections: "200"
backup:
barmanObjectStore:
destinationPath: s3://pg-backups/cnpg/
endpointURL: https://s3.eu-west-1.amazonaws.com
s3Credentials:
accessKeyId:
name: aws-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-creds
key: SECRET_ACCESS_KEY
wal:
compression: gzip
retentionPolicy: "30d"
resources:
requests:
cpu: "2"
memory: 4Gi
limits:
cpu: "4"
memory: 8Gi
# Percona XtraDB Cluster with Longhorn storage
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
name: production-mysql
namespace: database
spec:
crVersion: "1.14.0"
pxc:
size: 3
image: percona/percona-xtradb-cluster:8.0
resources:
requests:
cpu: "2"
memory: 4Gi
volumeSpec:
persistentVolumeClaim:
storageClassName: longhorn-db
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
backup:
image: percona/percona-xtradb-cluster-operator:1.14.0-pxc8.0-backup-pxb8.0.35
storages:
s3-backup:
type: s3
s3:
bucket: mysql-backups
region: eu-west-1
credentialsSecret: aws-creds
schedule:
- name: daily-full
schedule: "0 2 * * *"
keep: 14
storageName: s3-backup
Conclusion
Longhorn transforms k3s from a lightweight Kubernetes distribution suitable only for edge and development into a production-ready platform capable of running mission-critical stateful workloads. Its architecture — per-volume engines, synchronous multi-node replication, integrated snapshot and backup to cloud object storage, DR volumes, volume encryption, and online PVC expansion — delivers enterprise-grade storage without enterprise-grade complexity.
The key to success with Longhorn in production is threefold. First, configure it properly from the start: use dedicated storage disks, set appropriate replication factors and data locality, and create purpose-built StorageClasses for different workload types. Second, integrate backup into your architecture from day one: recurring snapshots for fast rollback, daily backups to S3/GCS/Azure for disaster recovery, and DR volumes in a standby cluster for the worst-case scenario. Third, monitor everything: volume health, node storage capacity, backup age, and replica rebuild status — problems with storage are invisible until they become catastrophic.
Dynamic PVC expansion eliminates one of the most common sources of production incidents — running out of disk space. With Longhorn, expanding a database volume from 50Gi to 500Gi is a single kubectl command with zero downtime. Combined with Prometheus alerts on capacity thresholds, you can proactively expand volumes before they become critical, or automate the expansion entirely.
Whether you are running PostgreSQL, MySQL, MongoDB, or Redis on k3s — on bare metal servers, AWS EC2, Azure VMs, GCP instances, or on-premises datacenter hardware — Longhorn provides the storage foundation that lets you focus on your applications instead of worrying about data durability. Install it, configure it, monitor it, and trust it with your production data.