Longhorn K3s Storage Production | Dynamic PVC Expansion & DR

Storage is the hardest problem in Kubernetes. Compute is stateless and fungible — kill a pod, schedule another. Networking has mature CNI plugins and service meshes. But storage? Storage is where state lives, where data persists across restarts, where a single misconfiguration can mean permanent data loss. For k3s clusters running production workloads — databases, message queues, application state — you need a storage solution that is distributed, resilient, expandable, and operable. Longhorn is that solution.

Longhorn is a lightweight, reliable, and easy-to-use distributed block storage system for Kubernetes. Originally developed by Rancher Labs (now part of SUSE), it is a CNCF incubating project purpose-built for clusters where simplicity matters but production reliability is non-negotiable. Unlike Ceph, which demands dedicated storage nodes and deep expertise, or local-path provisioner, which offers zero redundancy, Longhorn strikes the exact balance k3s clusters need: distributed replication across nodes, dynamic volume expansion, integrated backup to cloud object storage, snapshot and disaster recovery — all managed through a clean UI and Kubernetes-native CRDs.

This guide covers everything you need to deploy Longhorn on k3s for production: architecture internals, installation methods, StorageClass configuration, dynamic PVC expansion, replication strategies, backup and disaster recovery, volume encryption, performance tuning for databases, monitoring, and troubleshooting. Every recommendation comes with production-tested configuration you can adapt to your environment.

Longhorn Architecture

Understanding Longhorn's architecture is essential for making informed decisions about replication, performance, and failure handling. Longhorn is composed of three core components that work together to provide distributed block storage on top of the local disks attached to your Kubernetes nodes.

The Longhorn Manager runs as a DaemonSet on every node in the cluster. It is the control plane of Longhorn — it handles API calls, orchestrates volume creation, manages replication, coordinates snapshots and backups, and communicates with the Kubernetes API server to manage PersistentVolume and PersistentVolumeClaim lifecycle. When you create a PVC that references a Longhorn StorageClass, the Longhorn Manager receives the request through the CSI driver, provisions the volume, and schedules its replicas across available nodes.

The Longhorn Engine is a per-volume storage controller implemented as a Linux userspace process (based on a fork of the Rancher Longhorn Engine). Each volume gets its own dedicated engine process running on the node where the volume is attached. The engine handles all read and write I/O for that volume, replicating writes synchronously to all configured replicas before acknowledging the write to the application. This per-volume architecture means that a crash or hang in one volume's engine does not affect any other volume — a critical isolation property for production.

Replicas are the actual data storage processes. Each replica stores a complete copy of the volume data on the local disk of the node where it runs. By default, Longhorn creates three replicas for each volume, distributed across different nodes (and optionally different zones). Replicas use a copy-on-write mechanism for snapshots, making snapshot creation instantaneous regardless of volume size.

This architecture delivers several key properties for production use. Fault tolerance: with three replicas across three nodes, the volume survives two simultaneous node failures. Isolation: each volume has its own engine process, so a bug or hang in one volume cannot cascade. Simplicity: no dedicated storage nodes, no separate Ceph or GlusterFS clusters — Longhorn runs on the same worker nodes as your application pods, using their local disks. Kubernetes-native: everything is managed through CRDs, kubectl, and the Kubernetes CSI interface.

Installing Longhorn on k3s

Longhorn can be installed on k3s through three methods: Helm chart (recommended for production), Rancher App Marketplace (if you have Rancher managing your cluster), or direct kubectl apply. Before installing, ensure your nodes meet the prerequisites.

Prerequisites

# Verify prerequisites on each node
# Required: open-iscsi (for iSCSI support)
sudo apt-get install -y open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid

# Required: NFSv4 client (for backup/restore to NFS targets)
sudo apt-get install -y nfs-common

# Recommended: check all prerequisites with Longhorn's environment check script
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.7.2/scripts/environment_check.sh | bash

# Verify kernel modules
lsmod | grep -E "iscsi_tcp|dm_crypt|nfs"

# On k3s specifically, local-path-provisioner is the default.
# We will make Longhorn the default StorageClass after installation.

Installation via Helm (Recommended)

# Add the Longhorn Helm repository
helm repo add longhorn https://charts.longhorn.io
helm repo update

# Create the namespace
kubectl create namespace longhorn-system

# Install with production values
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --version 1.7.2 \
  --values longhorn-values.yaml

Here is a production-grade values.yaml with tuned defaults:

# longhorn-values.yaml — Production configuration
persistence:
  defaultClass: true
  defaultFsType: ext4
  defaultClassReplicaCount: 3
  defaultDataLocality: best-effort
  reclaimPolicy: Retain

defaultSettings:
  backupTarget: s3://longhorn-backups@eu-west-1/
  backupTargetCredentialSecret: longhorn-backup-s3-secret
  createDefaultDiskLabeledNodes: true
  defaultDataPath: /var/lib/longhorn/
  defaultReplicaCount: 3
  defaultDataLocality: best-effort
  replicaSoftAntiAffinity: false
  replicaAutoBalance: best-effort
  storageOverProvisioningPercentage: 150
  storageMinimalAvailablePercentage: 15
  guaranteedInstanceManagerCPU: 12
  upgradeChecker: false
  autoSalvage: true
  autoDeletePodWhenVolumeDetachedUnexpectedly: true
  disableSchedulingOnCordonedNode: true
  replicaZoneSoftAntiAffinity: true
  volumeAttachmentRecoveryPolicy: wait
  snapshotDataIntegrity: fast-check
  snapshotDataIntegrityCronjob: "0 7 * * *"
  concurrentAutomaticEngineUpgradePerNodeLimit: 1

longhornManager:
  priorityClass: system-cluster-critical
  tolerations:
    - key: "node-role.kubernetes.io/storage"
      operator: "Exists"
      effect: "NoSchedule"

longhornDriver:
  priorityClass: system-cluster-critical
  tolerations:
    - key: "node-role.kubernetes.io/storage"
      operator: "Exists"
      effect: "NoSchedule"

longhornUI:
  replicas: 2

ingress:
  enabled: true
  ingressClassName: nginx
  host: longhorn.internal.example.com
  tls: true
  tlsSecret: longhorn-tls
  annotations:
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: longhorn-basic-auth
    nginx.ingress.kubernetes.io/auth-realm: "Longhorn UI"

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

Installation via Rancher App Marketplace

If Rancher manages your k3s cluster, navigate to Apps & Marketplace → Charts → Longhorn in the Rancher UI. Select your target namespace (longhorn-system), configure the values through the form interface, and click Install. Rancher handles Helm lifecycle management and upgrade tracking automatically.

Installation via kubectl

# Direct manifest installation
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.7.2/deploy/longhorn.yaml

# Verify all pods are running
kubectl -n longhorn-system get pods -w

Post-Installation: Make Longhorn the Default StorageClass

# Remove default annotation from k3s local-path
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

# Verify Longhorn is now default
kubectl get storageclass
# NAME                 PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
# longhorn (default)   driver.longhorn.io   Retain          Immediate           true                   5m
# local-path           rancher.io/local-path Delete         WaitForFirstConsumer false                 30d

StorageClass Configuration for Production

The default Longhorn StorageClass works for development, but production workloads need specific configurations for different use cases — databases need high replication and specific data locality, temporary processing needs fast single-replica volumes, and shared volumes need RWX support.

# StorageClass for database workloads — maximum durability
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-db
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: ext4
  dataLocality: best-effort
  recurringJobSelector: '[{"name":"db-snapshot","isGroup":true},{"name":"db-backup","isGroup":true}]'
---
# StorageClass for general workloads — balanced
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-standard
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "2"
  staleReplicaTimeout: "2880"
  fsType: ext4
  dataLocality: best-effort
---
# StorageClass for temporary/cache workloads — performance
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "1"
  staleReplicaTimeout: "2880"
  fsType: ext4
  dataLocality: strict-local
---
# StorageClass for RWX (ReadWriteMany) shared volumes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-rwx
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  nfsOptions: "vers=4.1,hard,timeo=50,retrans=3"
  staleReplicaTimeout: "2880"
  fsType: ext4

Dynamic PVC Expansion

One of Longhorn's most important production features is dynamic PVC expansion — the ability to grow a persistent volume's size without downtime, without data loss, and without manual intervention beyond a single kubectl command or manifest change. This is critical for databases where data growth is unpredictable and running out of disk space means an outage.

Longhorn supports both online expansion (volume stays attached and mounted while it grows) and offline expansion (volume is detached first). Online expansion is the default and recommended approach for production because it avoids application downtime.

Enabling Volume Expansion in StorageClass

The key requirement for dynamic PVC expansion is that the StorageClass must have allowVolumeExpansion: true. The Longhorn default StorageClass already includes this, but if you have custom StorageClasses, verify this field is set.

# Verify your StorageClass supports expansion
kubectl get storageclass longhorn -o yaml | grep allowVolumeExpansion
# allowVolumeExpansion: true

# If not set, patch it
kubectl patch storageclass longhorn -p '{"allowVolumeExpansion": true}'

Step-by-Step: Expanding a PVC Dynamically

# 1. Check current PVC size
kubectl get pvc pg-data-postgresql-0 -n database
# NAME                    STATUS   VOLUME       CAPACITY   ACCESS MODES   STORAGECLASS   AGE
# pg-data-postgresql-0    Bound    pvc-abc123   50Gi       RWO            longhorn-db    30d

# 2. Check current usage inside the pod
kubectl exec -n database postgresql-0 -- df -h /var/lib/postgresql/data
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/longhorn   49G   42G  7.0G  86% /var/lib/postgresql/data

# 3. Expand the PVC (online — no downtime)
kubectl patch pvc pg-data-postgresql-0 -n database --type merge -p '{
  "spec": {
    "resources": {
      "requests": {
        "storage": "100Gi"
      }
    }
  }
}'

# 4. Watch the expansion progress
kubectl get pvc pg-data-postgresql-0 -n database -w
# Wait for CAPACITY to update from 50Gi to 100Gi

# 5. Verify the filesystem expanded inside the pod
kubectl exec -n database postgresql-0 -- df -h /var/lib/postgresql/data
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/longhorn   99G   42G   57G  43% /var/lib/postgresql/data

# 6. Verify via Longhorn volume status
kubectl -n longhorn-system get volumes.longhorn.io -o wide

Automating PVC Expansion with Alerts

In production, you should not wait until a disk is 86% full to expand it manually. Use Prometheus alerts to trigger expansion automatically or alert the on-call engineer before capacity becomes critical.

# PrometheusRule for PVC capacity alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: pvc-capacity-alerts
  namespace: monitoring
spec:
  groups:
    - name: pvc-capacity
      rules:
        - alert: PVCCapacityWarning
          expr: |
            (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.80
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "PVC {{ $labels.persistentvolumeclaim }} at {{ $value | humanizePercentage }} capacity"
            runbook: "Expand the PVC using: kubectl patch pvc {{ $labels.persistentvolumeclaim }} -n {{ $labels.namespace }} --type merge -p '{\"spec\":{\"resources\":{\"requests\":{\"storage\":\"NEW_SIZE\"}}}}'"
        - alert: PVCCapacityCritical
          expr: |
            (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.90
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "CRITICAL: PVC {{ $labels.persistentvolumeclaim }} at {{ $value | humanizePercentage }}"
            description: "Immediate expansion required to prevent application failure"

Volume Replication and Data Locality

Longhorn replicates volume data across multiple nodes to protect against hardware failures. The replication factor and data locality settings control the trade-off between durability, performance, and storage efficiency.

Replication factor determines how many copies of the data exist. The default is 3, meaning each write is stored on three different nodes. For production databases, 3 is the minimum recommended value. You can set it to 2 for less critical workloads to save storage, or leave it at 1 for temporary/cache volumes where data loss is acceptable.

Data locality controls whether Longhorn tries to keep a replica on the same node as the pod consuming the volume. There are three modes:

disabled — Replicas are scheduled purely based on available space and anti-affinity. The pod might read from a replica on a remote node, adding network latency to every I/O operation.
best-effort — Longhorn tries to place one replica on the same node as the consuming pod. If the local node runs out of space or the pod migrates, the volume still works but may have slightly higher latency. This is the recommended setting for most workloads.
strict-local — The volume can only be used on a node that has a local replica. If the pod is scheduled to a node without a local replica, the volume attachment fails. Use this only for latency-sensitive single-replica workloads where you accept the durability trade-off.

# Change replication factor on an existing volume
kubectl -n longhorn-system patch volumes.longhorn.io pvc-abc123 \
  --type merge -p '{"spec":{"numberOfReplicas":3}}'

# Set data locality on existing volume
kubectl -n longhorn-system patch volumes.longhorn.io pvc-abc123 \
  --type merge -p '{"spec":{"dataLocality":"best-effort"}}'

# Replica auto-balancing (spreads replicas evenly across nodes)
# Set via Longhorn settings
kubectl -n longhorn-system edit settings.longhorn.io replica-auto-balance
# Set value to: best-effort

Snapshots and Backups

Longhorn provides two distinct data protection mechanisms: snapshots (local, instant, for quick rollback) and backups (remote, to object storage, for disaster recovery). Understanding when to use each is critical.

Snapshots are stored locally on the same disks as the volume replicas. They are created instantaneously using copy-on-write — no data is copied at snapshot time, only new writes after the snapshot allocate additional space. Snapshots are excellent for quick rollback before a risky migration or deployment, but they do not protect against node or disk failure because they live on the same storage as the volume.

Backups copy the volume data to an external backup target — S3, GCS, Azure Blob, or any S3-compatible store (MinIO, Wasabi). Backups are incremental at the block level: only blocks that changed since the last backup are transferred. This makes recurring backups fast and storage-efficient. Backups protect against total cluster loss because they exist independently.

Configuring the Backup Target

# Create the S3 credentials secret
kubectl create secret generic longhorn-backup-s3-secret \
  -n longhorn-system \
  --from-literal=AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE \
  --from-literal=AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
  --from-literal=AWS_ENDPOINTS=https://s3.eu-west-1.amazonaws.com

# Set the backup target in Longhorn settings
kubectl -n longhorn-system edit settings.longhorn.io backup-target
# Set value to: s3://longhorn-backups@eu-west-1/

kubectl -n longhorn-system edit settings.longhorn.io backup-target-credential-secret
# Set value to: longhorn-backup-s3-secret

# For GCS backup target
# value: s3://longhorn-backups@us/  (GCS is S3-compatible via interop)
# Or use the GCS JSON key:
kubectl create secret generic longhorn-backup-gcs-secret \
  -n longhorn-system \
  --from-literal=GOOGLE_APPLICATION_CREDENTIALS=/var/longhorn-backup/gcs-key.json \
  --from-file=gcs-key.json=./service-account-key.json

# For Azure Blob backup target
kubectl create secret generic longhorn-backup-azure-secret \
  -n longhorn-system \
  --from-literal=AZBLOB_ACCOUNT_NAME=prodbackupstorage \
  --from-literal=AZBLOB_ACCOUNT_KEY=base64encodedkeyhere

VolumeSnapshot Class and Snapshot YAML

# VolumeSnapshotClass for Longhorn
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: snap
---
# Create a VolumeSnapshot
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: pg-data-snap-before-migration
  namespace: database
spec:
  volumeSnapshotClassName: longhorn-snapshot-vsc
  source:
    persistentVolumeClaimName: pg-data-postgresql-0
---
# Restore a PVC from a snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pg-data-restored
  namespace: database
spec:
  storageClassName: longhorn-db
  dataSource:
    name: pg-data-snap-before-migration
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

Recurring Backup Schedules

# Recurring snapshot job — every 4 hours, retain 6
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: db-snapshot-4h
  namespace: longhorn-system
spec:
  cron: "0 */4 * * *"
  task: snapshot
  retain: 6
  concurrency: 2
  groups:
    - db-volumes
  labels:
    tier: database
---
# Recurring backup job — daily at 02:00, retain 14 days
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: db-backup-daily
  namespace: longhorn-system
spec:
  cron: "0 2 * * *"
  task: backup
  retain: 14
  concurrency: 2
  groups:
    - db-volumes
  labels:
    tier: database
---
# Recurring backup job — weekly full, retain 8 weeks
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: db-backup-weekly
  namespace: longhorn-system
spec:
  cron: "0 3 * * 0"
  task: backup
  retain: 8
  concurrency: 1
  groups:
    - db-volumes
  labels:
    tier: database
---
# Assign recurring jobs to a volume via labels
# Label the PVC or volume with: recurring-job-group.longhorn.io/db-volumes: enabled
kubectl label pvc pg-data-postgresql-0 -n database \
  recurring-job-group.longhorn.io/db-volumes=enabled

Disaster Recovery

Longhorn provides a built-in disaster recovery mechanism through DR volumes — standby volumes in a secondary cluster that continuously pull incremental backups from the primary cluster's backup target. When disaster strikes, you activate the DR volume and it becomes a regular read-write volume, letting the secondary cluster take over.

Setting Up DR Volumes

# On the DR cluster: create a DR volume from the primary's backup
# This is done via the Longhorn UI or API

# 1. Configure the DR cluster's backup target to point to the same S3 bucket
kubectl -n longhorn-system edit settings.longhorn.io backup-target
# value: s3://longhorn-backups@eu-west-1/

# 2. Create a DR volume from the latest backup
# Via Longhorn UI: Backup → Select volume → Create DR Volume
# Via kubectl:
apiVersion: longhorn.io/v1beta2
kind: Volume
metadata:
  name: pg-data-dr
  namespace: longhorn-system
spec:
  size: "107374182400"  # 100Gi in bytes
  numberOfReplicas: 3
  fromBackup: "s3://longhorn-backups@eu-west-1/?backup=backup-abc123&volume=pg-data"
  standby: true
  frontend: ""

# 3. The DR volume auto-syncs incremental backups from S3
# Monitor sync status:
kubectl -n longhorn-system get volumes.longhorn.io pg-data-dr -o jsonpath='{.status.lastBackup}'

# 4. During failover — activate the DR volume:
kubectl -n longhorn-system patch volumes.longhorn.io pg-data-dr \
  --type merge -p '{"spec":{"standby":false,"frontend":"blockdev"}}'

# 5. Create a PVC bound to the activated DR volume
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pg-data-postgresql-0
  namespace: database
spec:
  storageClassName: longhorn-db
  volumeName: pg-data-dr
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

Volume Encryption

Longhorn supports volume-level encryption using Linux LUKS2. Encrypted volumes protect data at rest on the underlying disk — even if someone gains physical access to the server's storage, they cannot read the volume data without the encryption key.

# Create a Kubernetes secret with the encryption passphrase
kubectl create secret generic longhorn-crypto \
  -n longhorn-system \
  --from-literal=CRYPTO_KEY_VALUE="a-very-long-and-random-passphrase-here" \
  --from-literal=CRYPTO_KEY_PROVIDER=secret \
  --from-literal=CRYPTO_KEY_CIPHER=aes-xts-plain64 \
  --from-literal=CRYPTO_KEY_HASH=sha256 \
  --from-literal=CRYPTO_KEY_SIZE=256 \
  --from-literal=CRYPTO_PBKDF=argon2i

# StorageClass with encryption enabled
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-encrypted
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  encrypted: "true"
  csi.storage.k8s.io/provisioner-secret-name: longhorn-crypto
  csi.storage.k8s.io/provisioner-secret-namespace: longhorn-system
  csi.storage.k8s.io/node-publish-secret-name: longhorn-crypto
  csi.storage.k8s.io/node-publish-secret-namespace: longhorn-system
  csi.storage.k8s.io/node-stage-secret-name: longhorn-crypto
  csi.storage.k8s.io/node-stage-secret-namespace: longhorn-system

ReadWriteMany (RWX) Support

By default, Longhorn volumes are ReadWriteOnce (RWO) — they can be mounted by a single pod on a single node. For workloads that need shared storage across multiple pods (e.g., shared media uploads, configuration files, ML model artifacts), Longhorn supports ReadWriteMany (RWX) through an integrated NFS server.

When a PVC requests RWX access mode, Longhorn automatically deploys a share-manager pod that runs an NFS server backed by the Longhorn volume. Multiple pods can then mount the volume simultaneously over NFS. This is simpler than deploying a separate NFS server but adds a layer of network overhead compared to direct block access.

# PVC with ReadWriteMany access mode
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-media
  namespace: application
spec:
  storageClassName: longhorn-rwx
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi

Database Workloads on Longhorn

Running databases on Longhorn requires careful attention to StorageClass parameters, pod affinity, and backup integration. Longhorn's per-volume engine and synchronous replication make it well-suited for database workloads, but you need to configure it correctly to get production-grade performance and reliability.

PostgreSQL StatefulSet with Longhorn

# PostgreSQL StatefulSet using Longhorn PVC
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgresql
  namespace: database
spec:
  serviceName: postgresql
  replicas: 3
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      terminationGracePeriodSeconds: 120
      securityContext:
        fsGroup: 999
        runAsUser: 999
      containers:
        - name: postgresql
          image: postgres:16-alpine
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_DB
              value: production
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: pg-credentials
                  key: username
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: pg-credentials
                  key: password
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
          resources:
            requests:
              cpu: "1"
              memory: 2Gi
            limits:
              cpu: "4"
              memory: 8Gi
          volumeMounts:
            - name: pg-data
              mountPath: /var/lib/postgresql/data
          livenessProbe:
            exec:
              command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
            initialDelaySeconds: 5
            periodSeconds: 5
  volumeClaimTemplates:
    - metadata:
        name: pg-data
        labels:
          recurring-job-group.longhorn.io/db-volumes: enabled
      spec:
        storageClassName: longhorn-db
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi

MySQL StatefulSet with Longhorn

# MySQL StatefulSet using Longhorn PVC
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: database
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: mysql
          image: mysql:8.0
          ports:
            - containerPort: 3306
          env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-credentials
                  key: root-password
            - name: MYSQL_DATABASE
              value: production
          args:
            - "--default-authentication-plugin=mysql_native_password"
            - "--innodb-buffer-pool-size=4G"
            - "--innodb-log-file-size=1G"
            - "--innodb-flush-log-at-trx-commit=1"
            - "--sync-binlog=1"
          resources:
            requests:
              cpu: "1"
              memory: 4Gi
            limits:
              cpu: "4"
              memory: 8Gi
          volumeMounts:
            - name: mysql-data
              mountPath: /var/lib/mysql
  volumeClaimTemplates:
    - metadata:
        name: mysql-data
        labels:
          recurring-job-group.longhorn.io/db-volumes: enabled
      spec:
        storageClassName: longhorn-db
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200Gi

Performance Tuning for Database Workloads

Longhorn adds a storage replication layer between the application and the physical disk, which introduces some latency compared to direct local disk access. For most workloads this overhead is negligible, but database workloads with heavy write patterns need tuning to achieve optimal performance.

Use data locality: best-effort. This ensures one replica is on the same node as the database pod, meaning reads hit local disk at NVMe/SSD speed. Writes still replicate to remote nodes, but the local replica eliminates network round-trips for reads.

Dedicate disks to Longhorn. Do not share the OS disk with Longhorn data. Add dedicated NVMe or SSD drives and configure them as Longhorn disks. This prevents I/O contention between the operating system and database volumes.

# Configure dedicated disks on a node
# Via kubectl (Longhorn node CRD)
kubectl -n longhorn-system edit nodes.longhorn.io worker-01

# Add the dedicated disk under spec.disks:
spec:
  disks:
    default-disk:
      allowScheduling: false  # disable OS disk for Longhorn
      path: /var/lib/longhorn/
      storageReserved: 0
    nvme-data:
      allowScheduling: true
      path: /mnt/nvme-longhorn/
      storageReserved: 10737418240  # 10Gi reserved
      tags:
        - nvme
        - database

Set guaranteed engine manager CPU. Longhorn's engine processes consume CPU for I/O processing and replication. The guaranteedInstanceManagerCPU setting reserves a percentage of node CPU for Longhorn instance managers, preventing CPU starvation under load.

Tune the replica count. For databases that already have application-level replication (PostgreSQL streaming replication, MySQL group replication, MongoDB replica sets), you can reduce the Longhorn replica count to 2 instead of 3. The database's own replication provides an additional layer of data protection, and fewer Longhorn replicas means less write amplification and better write throughput.

Use ext4 over xfs for small random I/O. While xfs excels at large sequential writes, ext4 generally performs better for the small random I/O pattern typical of database workloads. Set fsType: ext4 in your StorageClass.

Node Scheduling and Disk Management

Longhorn provides fine-grained control over which nodes and disks are used for volume scheduling. This is critical in heterogeneous clusters where some nodes have fast NVMe storage and others have slower SATA drives.

# Tag nodes for storage scheduling
kubectl label node worker-01 node.longhorn.io/storage=nvme
kubectl label node worker-02 node.longhorn.io/storage=nvme
kubectl label node worker-03 node.longhorn.io/storage=nvme
kubectl label node worker-04 node.longhorn.io/storage=sata

# Use node selector in StorageClass to target NVMe nodes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-nvme
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
  numberOfReplicas: "3"
  nodeSelector: "node.longhorn.io/storage=nvme"
  diskSelector: "nvme,database"
  dataLocality: best-effort

Monitoring with Prometheus and Grafana

Longhorn exposes Prometheus metrics through a built-in metrics endpoint. Monitoring these metrics is essential for capacity planning, performance analysis, and proactive alerting.

# ServiceMonitor for Longhorn metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: longhorn-prometheus
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: longhorn-manager
  namespaceSelector:
    matchNames:
      - longhorn-system
  endpoints:
    - port: manager
      path: /metrics
      interval: 30s
---
# PrometheusRule for Longhorn alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: longhorn-alerts
  namespace: monitoring
spec:
  groups:
    - name: longhorn-storage
      rules:
        - alert: LonghornVolumeStatusCritical
          expr: longhorn_volume_robustness == 3
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Longhorn volume {{ $labels.volume }} is Faulted"
        - alert: LonghornVolumeStatusDegraded
          expr: longhorn_volume_robustness == 2
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Longhorn volume {{ $labels.volume }} is Degraded"
        - alert: LonghornNodeStorageWarning
          expr: |
            (longhorn_node_storage_usage_bytes / longhorn_node_storage_capacity_bytes) > 0.80
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Longhorn node {{ $labels.node }} storage at {{ $value | humanizePercentage }}"
        - alert: LonghornBackupFailed
          expr: longhorn_backup_state == 3
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Longhorn backup failed for volume {{ $labels.volume }}"

Key Grafana dashboard panels to create for Longhorn monitoring include: volume IOPS (read/write), volume throughput (MB/s), volume latency (p50/p95/p99), replica rebuild progress, node storage capacity and usage, backup status and age, and snapshot count per volume.

k3s Cluster with Complete Longhorn Storage Stack

The following diagram shows how Longhorn fits into a complete k3s production stack, from bare metal servers through to application pods, with Rancher management and Prometheus/Grafana observability.

Comparison with Other Kubernetes Storage Solutions

Longhorn is not the only storage option for Kubernetes. Understanding how it compares to alternatives helps you make the right choice for your specific requirements.

Feature	Longhorn	Rook-Ceph	OpenEBS (Mayastor)	local-path
Complexity	Low	High	Medium	Minimal
Replication	Built-in (2–3 replicas)	CRUSH algorithm	NVMe-oF replicas	None
Dynamic PVC Expansion	Yes (online)	Yes	Yes	No
Snapshots	COW snapshots	RBD snapshots	Yes	No
Backup to Cloud	Built-in (S3/GCS/Azure)	Via rbd export	Via Velero	No
DR Volumes	Native standby volumes	RBD mirroring	No	No
RWX Support	Yes (NFS-based)	Yes (CephFS)	No	No
Volume Encryption	LUKS2	dmcrypt	No	No
UI	Built-in web UI	Ceph dashboard	Minimal	None
Min Nodes	1 (3 for HA)	3 (dedicated OSD nodes)	3	1
Resource Overhead	Low-Medium	High	Medium	None
Best For	k3s/RKE2 production	Large-scale enterprise	High-performance NVMe	Dev/single-node

Longhorn vs Rook-Ceph: Ceph is the more powerful system — it supports object storage, file storage, and block storage with a sophisticated CRUSH placement algorithm. However, Ceph demands at least 3 dedicated OSD nodes, significant RAM (minimum 4GB per OSD daemon), and deep operational expertise. For k3s clusters with 3–10 nodes, Longhorn provides 90% of the value at 10% of the operational cost.

Longhorn vs OpenEBS Mayastor: Mayastor uses NVMe-over-Fabrics for high-performance replication, achieving lower latency than Longhorn's TCP-based replication. If raw IOPS and sub-millisecond latency are your primary concern and you have NVMe infrastructure with RDMA networking, Mayastor may be the better choice. For most k3s deployments, Longhorn's integrated backup, DR, and operational simplicity outweigh Mayastor's performance edge.

Longhorn vs local-path: local-path provisioner is k3s's default storage — it simply creates directories on the node's local filesystem. Zero replication, zero snapshots, zero backup integration. It is fine for development but unacceptable for production data.

Production Best Practices

These recommendations are distilled from operating Longhorn across dozens of production k3s clusters running database workloads.

1. Set resource reservations for instance managers. Longhorn instance managers (engine and replica managers) need guaranteed CPU to avoid I/O stalls during node pressure. Set guaranteedInstanceManagerCPU to at least 12% in Longhorn settings.

2. Use Retain reclaim policy for database volumes. Never use Delete for database PVCs. A Retain policy keeps the PV and its data even after the PVC is deleted, giving you a safety net against accidental deletion.

3. Disable replica soft anti-affinity for production. Set replicaSoftAntiAffinity: false to ensure replicas are always spread across different nodes. With soft anti-affinity, Longhorn may schedule multiple replicas on the same node when space is tight — defeating the purpose of replication.

4. Reserve storage space on every node. Set storageMinimalAvailablePercentage to at least 15%. This prevents Longhorn from consuming all disk space, which would cause node-level issues affecting all pods.

5. Enable auto-salvage. The autoSalvage setting automatically recovers volumes that enter a faulted state when at least one replica is still healthy. This reduces manual intervention during node failures.

6. Set priority class to system-cluster-critical. Longhorn's manager and driver components should never be evicted during node pressure. Set their priorityClass to system-cluster-critical to ensure they survive pod eviction.

7. Configure recurring jobs for all production volumes. Every production volume should have both snapshot and backup recurring jobs. Snapshots every 4 hours for quick rollback, backups daily for disaster recovery.

8. Test DR volume activation regularly. Create a monthly schedule to activate DR volumes in your standby cluster, verify data integrity, and practice the failover procedure. A DR plan that has never been tested is just documentation.

9. Monitor volume health proactively. Set up Prometheus alerts for degraded and faulted volumes, node storage capacity, backup age, and replica rebuild status. By the time a user reports a slow database, the storage problem has been building for hours.

10. Use dedicated storage disks. Separate Longhorn data from the OS disk. This prevents I/O contention, gives you cleaner capacity management, and avoids the risk of filling the OS disk with Longhorn data.

Cloud-Specific Deployment Guidance

AWS — EC2 with NVMe Instance Storage

For AWS deployments, use i3.xlarge or i3en.xlarge instances that come with NVMe instance storage. These provide raw NVMe performance at a fraction of the cost of provisioned EBS IOPS. Format the instance storage and configure it as a Longhorn disk.

# On each EC2 i3.xlarge instance
sudo mkfs.ext4 /dev/nvme0n1
sudo mkdir -p /mnt/longhorn-nvme
sudo mount /dev/nvme0n1 /mnt/longhorn-nvme
echo '/dev/nvme0n1 /mnt/longhorn-nvme ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab

# Configure as Longhorn disk (via node CRD or UI)
# Path: /mnt/longhorn-nvme/

Azure — VMs with Ultra Disk

On Azure, use Standard_L8s_v3 or Standard_L16s_v3 VMs with local NVMe storage. Alternatively, attach Ultra Disks for consistent sub-millisecond latency. Ultra Disks allow you to independently configure IOPS and throughput.

# Attach Ultra Disk to Azure VM (Azure CLI)
az vm disk attach \
  --vm-name k3s-worker-01 \
  --resource-group k3s-cluster \
  --name longhorn-ultra-01 \
  --size-gb 512 \
  --sku UltraSSD_LRS \
  --disk-iops-read-write 10000 \
  --disk-mbps-read-write 300 \
  --new

GCP — VMs with Local SSD

GCP offers Local SSD attached to n2-standard or c3-standard VMs. Local SSDs provide 375 GB per disk with up to 680,000 read IOPS. Attach multiple Local SSDs and RAID them for larger volumes.

# Create VM with local SSDs (gcloud)
gcloud compute instances create k3s-worker-01 \
  --machine-type=n2-standard-8 \
  --local-ssd=interface=NVME \
  --local-ssd=interface=NVME \
  --zone=europe-west1-b

# RAID two local SSDs for 750GB Longhorn disk
sudo mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme0n2
sudo mkfs.ext4 /dev/md0
sudo mkdir -p /mnt/longhorn-nvme
sudo mount /dev/md0 /mnt/longhorn-nvme

On-Premises Datacenter

For on-premises deployments, use servers with dedicated NVMe or SAS SSD drives for Longhorn. Separate the OS disk from storage disks. Use a 10GbE or 25GbE network between nodes to ensure replication traffic does not bottleneck — Longhorn synchronous replication generates network traffic proportional to write throughput multiplied by replica count.

# Typical on-premises server spec for Longhorn k3s node
# CPU: 8+ cores (Xeon or EPYC)
# RAM: 32+ GB
# OS Disk: 256GB SATA SSD (mirrored)
# Longhorn Disk: 2x 1TB NVMe (RAID-1 or individual Longhorn disks)
# Network: 2x 25GbE (bonded, one for application, one for storage)

# Configure dedicated storage network (optional but recommended)
# In Longhorn settings:
# storage-network: kube-system/storage-network-attachment

Longhorn UI Walkthrough

Longhorn ships with a built-in web UI that provides a visual interface for managing volumes, snapshots, backups, nodes, and settings. Access it through the Ingress configured during installation or via kubectl port-forward.

# Quick access via port-forward
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80
# Open http://localhost:8080 in your browser

The UI provides several key views: Dashboard shows cluster-wide storage health including total capacity, used space, and volume status. Volume lists all volumes with their state (Healthy/Degraded/Faulted), size, replica count, and attached node. You can expand, snapshot, back up, and restore volumes directly from this view. Node shows each node's storage configuration, disk allocation, and scheduling status. Backup lists all backups stored in the backup target with their volume, size, and creation time. Setting exposes all Longhorn configuration parameters with descriptions.

Troubleshooting Common Issues

Degraded Volumes

A volume enters Degraded state when one or more replicas are unhealthy but the volume is still functional. Common causes include node failure, disk full, or network partition.

# Check volume status
kubectl -n longhorn-system get volumes.longhorn.io -o wide

# Describe the volume for detailed status
kubectl -n longhorn-system describe volumes.longhorn.io pvc-abc123

# Check replica status
kubectl -n longhorn-system get replicas.longhorn.io -l longhornvolume=pvc-abc123

# If a replica is stuck in error state, delete it to trigger rebuild
kubectl -n longhorn-system delete replicas.longhorn.io pvc-abc123-r-12345

# Monitor rebuild progress
kubectl -n longhorn-system get volumes.longhorn.io pvc-abc123 \
  -o jsonpath='{.status.conditions}' | jq

Volume Stuck in Attaching/Detaching

# Check engine status
kubectl -n longhorn-system get engines.longhorn.io -l longhornvolume=pvc-abc123

# Check for stuck VolumeAttachment objects
kubectl get volumeattachments | grep pvc-abc123

# Force detach a stuck volume (use cautiously)
kubectl -n longhorn-system patch volumes.longhorn.io pvc-abc123 \
  --type merge -p '{"spec":{"nodeID":""}}'

# If attachment is truly stuck, delete the VolumeAttachment
kubectl delete volumeattachment csi-abc123def456

Space Pressure

# Check node storage usage
kubectl -n longhorn-system get nodes.longhorn.io -o wide

# Identify large snapshots consuming space
kubectl -n longhorn-system get snapshots.longhorn.io -l longhornvolume=pvc-abc123

# Purge old snapshots to reclaim space
# Via Longhorn UI: Volume → Snapshots → Delete unneeded snapshots
# Old snapshots consume space because COW data cannot be freed until the snapshot is deleted

# Check for snapshot data integrity issues
kubectl -n longhorn-system get settings.longhorn.io snapshot-data-integrity

Upgrade Procedures

Longhorn upgrades should be performed carefully, as the storage system underpins all stateful workloads. Always take backups of all critical volumes before upgrading.

# 1. Back up all volumes before upgrade
for vol in $(kubectl -n longhorn-system get volumes.longhorn.io -o jsonpath='{.items[*].metadata.name}'); do
  echo "Backing up $vol"
  kubectl -n longhorn-system patch volumes.longhorn.io $vol \
    --type merge -p '{"spec":{"lastBackupAt":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'"}}'
done

# 2. Check current version
kubectl -n longhorn-system get daemonset longhorn-manager -o jsonpath='{.spec.template.spec.containers[0].image}'

# 3. Upgrade via Helm
helm repo update
helm upgrade longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --version 1.7.2 \
  --values longhorn-values.yaml

# 4. Monitor the rolling upgrade
kubectl -n longhorn-system rollout status daemonset/longhorn-manager
kubectl -n longhorn-system rollout status deployment/longhorn-driver-deployer

# 5. Verify all volumes are healthy after upgrade
kubectl -n longhorn-system get volumes.longhorn.io -o custom-columns=NAME:.metadata.name,STATE:.status.state,ROBUSTNESS:.status.robustness

# 6. Longhorn automatically upgrades engines — monitor progress
kubectl -n longhorn-system get engines.longhorn.io -o wide

Integration with Database Operators

Modern Kubernetes database operators (CloudNativePG, Percona Operator, Zalando Postgres Operator) work seamlessly with Longhorn. The operator manages the database lifecycle while Longhorn provides the underlying storage with replication, snapshots, and backup.

# CloudNativePG cluster using Longhorn storage
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: production-pg
  namespace: database
spec:
  instances: 3
  storage:
    size: 100Gi
    storageClass: longhorn-db
    pvcTemplate:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
  walStorage:
    size: 20Gi
    storageClass: longhorn-fast

  postgresql:
    parameters:
      shared_buffers: "2GB"
      effective_cache_size: "6GB"
      maintenance_work_mem: "512MB"
      wal_buffers: "64MB"
      max_connections: "200"

  backup:
    barmanObjectStore:
      destinationPath: s3://pg-backups/cnpg/
      endpointURL: https://s3.eu-west-1.amazonaws.com
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: SECRET_ACCESS_KEY
      wal:
        compression: gzip
    retentionPolicy: "30d"

  resources:
    requests:
      cpu: "2"
      memory: 4Gi
    limits:
      cpu: "4"
      memory: 8Gi

# Percona XtraDB Cluster with Longhorn storage
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
  name: production-mysql
  namespace: database
spec:
  crVersion: "1.14.0"
  pxc:
    size: 3
    image: percona/percona-xtradb-cluster:8.0
    resources:
      requests:
        cpu: "2"
        memory: 4Gi
    volumeSpec:
      persistentVolumeClaim:
        storageClassName: longhorn-db
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200Gi
  backup:
    image: percona/percona-xtradb-cluster-operator:1.14.0-pxc8.0-backup-pxb8.0.35
    storages:
      s3-backup:
        type: s3
        s3:
          bucket: mysql-backups
          region: eu-west-1
          credentialsSecret: aws-creds
    schedule:
      - name: daily-full
        schedule: "0 2 * * *"
        keep: 14
        storageName: s3-backup

Conclusion

Longhorn transforms k3s from a lightweight Kubernetes distribution suitable only for edge and development into a production-ready platform capable of running mission-critical stateful workloads. Its architecture — per-volume engines, synchronous multi-node replication, integrated snapshot and backup to cloud object storage, DR volumes, volume encryption, and online PVC expansion — delivers enterprise-grade storage without enterprise-grade complexity.

The key to success with Longhorn in production is threefold. First, configure it properly from the start: use dedicated storage disks, set appropriate replication factors and data locality, and create purpose-built StorageClasses for different workload types. Second, integrate backup into your architecture from day one: recurring snapshots for fast rollback, daily backups to S3/GCS/Azure for disaster recovery, and DR volumes in a standby cluster for the worst-case scenario. Third, monitor everything: volume health, node storage capacity, backup age, and replica rebuild status — problems with storage are invisible until they become catastrophic.

Dynamic PVC expansion eliminates one of the most common sources of production incidents — running out of disk space. With Longhorn, expanding a database volume from 50Gi to 500Gi is a single kubectl command with zero downtime. Combined with Prometheus alerts on capacity thresholds, you can proactively expand volumes before they become critical, or automate the expansion entirely.

Whether you are running PostgreSQL, MySQL, MongoDB, or Redis on k3s — on bare metal servers, AWS EC2, Azure VMs, GCP instances, or on-premises datacenter hardware — Longhorn provides the storage foundation that lets you focus on your applications instead of worrying about data durability. Install it, configure it, monitor it, and trust it with your production data.

Longhorn Architecture

Installing Longhorn on k3s

Prerequisites

# Verify prerequisites on each node
# Required: open-iscsi (for iSCSI support)
sudo apt-get install -y open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid

# Required: NFSv4 client (for backup/restore to NFS targets)
sudo apt-get install -y nfs-common

# Recommended: check all prerequisites with Longhorn's environment check script
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.7.2/scripts/environment_check.sh | bash

# Verify kernel modules
lsmod | grep -E "iscsi_tcp|dm_crypt|nfs"

# On k3s specifically, local-path-provisioner is the default.
# We will make Longhorn the default StorageClass after installation.

Installation via Helm (Recommended)

# Add the Longhorn Helm repository
helm repo add longhorn https://charts.longhorn.io
helm repo update

# Create the namespace
kubectl create namespace longhorn-system

# Install with production values
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --version 1.7.2 \
  --values longhorn-values.yaml

Here is a production-grade values.yaml with tuned defaults:

# longhorn-values.yaml — Production configuration
persistence:
  defaultClass: true
  defaultFsType: ext4
  defaultClassReplicaCount: 3
  defaultDataLocality: best-effort
  reclaimPolicy: Retain

defaultSettings:
  backupTarget: s3://longhorn-backups@eu-west-1/
  backupTargetCredentialSecret: longhorn-backup-s3-secret
  createDefaultDiskLabeledNodes: true
  defaultDataPath: /var/lib/longhorn/
  defaultReplicaCount: 3
  defaultDataLocality: best-effort
  replicaSoftAntiAffinity: false
  replicaAutoBalance: best-effort
  storageOverProvisioningPercentage: 150
  storageMinimalAvailablePercentage: 15
  guaranteedInstanceManagerCPU: 12
  upgradeChecker: false
  autoSalvage: true
  autoDeletePodWhenVolumeDetachedUnexpectedly: true
  disableSchedulingOnCordonedNode: true
  replicaZoneSoftAntiAffinity: true
  volumeAttachmentRecoveryPolicy: wait
  snapshotDataIntegrity: fast-check
  snapshotDataIntegrityCronjob: "0 7 * * *"
  concurrentAutomaticEngineUpgradePerNodeLimit: 1

longhornManager:
  priorityClass: system-cluster-critical
  tolerations:
    - key: "node-role.kubernetes.io/storage"
      operator: "Exists"
      effect: "NoSchedule"

longhornDriver:
  priorityClass: system-cluster-critical
  tolerations:
    - key: "node-role.kubernetes.io/storage"
      operator: "Exists"
      effect: "NoSchedule"

longhornUI:
  replicas: 2

ingress:
  enabled: true
  ingressClassName: nginx
  host: longhorn.internal.example.com
  tls: true
  tlsSecret: longhorn-tls
  annotations:
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: longhorn-basic-auth
    nginx.ingress.kubernetes.io/auth-realm: "Longhorn UI"

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

Installation via Rancher App Marketplace

Installation via kubectl

# Direct manifest installation
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.7.2/deploy/longhorn.yaml

# Verify all pods are running
kubectl -n longhorn-system get pods -w

Post-Installation: Make Longhorn the Default StorageClass

# Remove default annotation from k3s local-path
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

# Verify Longhorn is now default
kubectl get storageclass
# NAME                 PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
# longhorn (default)   driver.longhorn.io   Retain          Immediate           true                   5m
# local-path           rancher.io/local-path Delete         WaitForFirstConsumer false                 30d

StorageClass Configuration for Production

# StorageClass for database workloads — maximum durability
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-db
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: ext4
  dataLocality: best-effort
  recurringJobSelector: '[{"name":"db-snapshot","isGroup":true},{"name":"db-backup","isGroup":true}]'
---
# StorageClass for general workloads — balanced
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-standard
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "2"
  staleReplicaTimeout: "2880"
  fsType: ext4
  dataLocality: best-effort
---
# StorageClass for temporary/cache workloads — performance
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "1"
  staleReplicaTimeout: "2880"
  fsType: ext4
  dataLocality: strict-local
---
# StorageClass for RWX (ReadWriteMany) shared volumes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-rwx
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  nfsOptions: "vers=4.1,hard,timeo=50,retrans=3"
  staleReplicaTimeout: "2880"
  fsType: ext4

Dynamic PVC Expansion

Enabling Volume Expansion in StorageClass

# Verify your StorageClass supports expansion
kubectl get storageclass longhorn -o yaml | grep allowVolumeExpansion
# allowVolumeExpansion: true

# If not set, patch it
kubectl patch storageclass longhorn -p '{"allowVolumeExpansion": true}'

Step-by-Step: Expanding a PVC Dynamically

# 1. Check current PVC size
kubectl get pvc pg-data-postgresql-0 -n database
# NAME                    STATUS   VOLUME       CAPACITY   ACCESS MODES   STORAGECLASS   AGE
# pg-data-postgresql-0    Bound    pvc-abc123   50Gi       RWO            longhorn-db    30d

# 2. Check current usage inside the pod
kubectl exec -n database postgresql-0 -- df -h /var/lib/postgresql/data
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/longhorn   49G   42G  7.0G  86% /var/lib/postgresql/data

# 3. Expand the PVC (online — no downtime)
kubectl patch pvc pg-data-postgresql-0 -n database --type merge -p '{
  "spec": {
    "resources": {
      "requests": {
        "storage": "100Gi"
      }
    }
  }
}'

# 4. Watch the expansion progress
kubectl get pvc pg-data-postgresql-0 -n database -w
# Wait for CAPACITY to update from 50Gi to 100Gi

# 5. Verify the filesystem expanded inside the pod
kubectl exec -n database postgresql-0 -- df -h /var/lib/postgresql/data
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/longhorn   99G   42G   57G  43% /var/lib/postgresql/data

# 6. Verify via Longhorn volume status
kubectl -n longhorn-system get volumes.longhorn.io -o wide

Automating PVC Expansion with Alerts

# PrometheusRule for PVC capacity alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: pvc-capacity-alerts
  namespace: monitoring
spec:
  groups:
    - name: pvc-capacity
      rules:
        - alert: PVCCapacityWarning
          expr: |
            (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.80
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "PVC {{ $labels.persistentvolumeclaim }} at {{ $value | humanizePercentage }} capacity"
            runbook: "Expand the PVC using: kubectl patch pvc {{ $labels.persistentvolumeclaim }} -n {{ $labels.namespace }} --type merge -p '{\"spec\":{\"resources\":{\"requests\":{\"storage\":\"NEW_SIZE\"}}}}'"
        - alert: PVCCapacityCritical
          expr: |
            (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.90
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "CRITICAL: PVC {{ $labels.persistentvolumeclaim }} at {{ $value | humanizePercentage }}"
            description: "Immediate expansion required to prevent application failure"

Volume Replication and Data Locality

Data locality controls whether Longhorn tries to keep a replica on the same node as the pod consuming the volume. There are three modes:

disabled — Replicas are scheduled purely based on available space and anti-affinity. The pod might read from a replica on a remote node, adding network latency to every I/O operation.
best-effort — Longhorn tries to place one replica on the same node as the consuming pod. If the local node runs out of space or the pod migrates, the volume still works but may have slightly higher latency. This is the recommended setting for most workloads.
strict-local — The volume can only be used on a node that has a local replica. If the pod is scheduled to a node without a local replica, the volume attachment fails. Use this only for latency-sensitive single-replica workloads where you accept the durability trade-off.

# Change replication factor on an existing volume
kubectl -n longhorn-system patch volumes.longhorn.io pvc-abc123 \
  --type merge -p '{"spec":{"numberOfReplicas":3}}'

# Set data locality on existing volume
kubectl -n longhorn-system patch volumes.longhorn.io pvc-abc123 \
  --type merge -p '{"spec":{"dataLocality":"best-effort"}}'

# Replica auto-balancing (spreads replicas evenly across nodes)
# Set via Longhorn settings
kubectl -n longhorn-system edit settings.longhorn.io replica-auto-balance
# Set value to: best-effort

Snapshots and Backups

Configuring the Backup Target

# Create the S3 credentials secret
kubectl create secret generic longhorn-backup-s3-secret \
  -n longhorn-system \
  --from-literal=AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE \
  --from-literal=AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
  --from-literal=AWS_ENDPOINTS=https://s3.eu-west-1.amazonaws.com

# Set the backup target in Longhorn settings
kubectl -n longhorn-system edit settings.longhorn.io backup-target
# Set value to: s3://longhorn-backups@eu-west-1/

kubectl -n longhorn-system edit settings.longhorn.io backup-target-credential-secret
# Set value to: longhorn-backup-s3-secret

# For GCS backup target
# value: s3://longhorn-backups@us/  (GCS is S3-compatible via interop)
# Or use the GCS JSON key:
kubectl create secret generic longhorn-backup-gcs-secret \
  -n longhorn-system \
  --from-literal=GOOGLE_APPLICATION_CREDENTIALS=/var/longhorn-backup/gcs-key.json \
  --from-file=gcs-key.json=./service-account-key.json

# For Azure Blob backup target
kubectl create secret generic longhorn-backup-azure-secret \
  -n longhorn-system \
  --from-literal=AZBLOB_ACCOUNT_NAME=prodbackupstorage \
  --from-literal=AZBLOB_ACCOUNT_KEY=base64encodedkeyhere

VolumeSnapshot Class and Snapshot YAML

# VolumeSnapshotClass for Longhorn
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: snap
---
# Create a VolumeSnapshot
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: pg-data-snap-before-migration
  namespace: database
spec:
  volumeSnapshotClassName: longhorn-snapshot-vsc
  source:
    persistentVolumeClaimName: pg-data-postgresql-0
---
# Restore a PVC from a snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pg-data-restored
  namespace: database
spec:
  storageClassName: longhorn-db
  dataSource:
    name: pg-data-snap-before-migration
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

Recurring Backup Schedules

# Recurring snapshot job — every 4 hours, retain 6
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: db-snapshot-4h
  namespace: longhorn-system
spec:
  cron: "0 */4 * * *"
  task: snapshot
  retain: 6
  concurrency: 2
  groups:
    - db-volumes
  labels:
    tier: database
---
# Recurring backup job — daily at 02:00, retain 14 days
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: db-backup-daily
  namespace: longhorn-system
spec:
  cron: "0 2 * * *"
  task: backup
  retain: 14
  concurrency: 2
  groups:
    - db-volumes
  labels:
    tier: database
---
# Recurring backup job — weekly full, retain 8 weeks
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: db-backup-weekly
  namespace: longhorn-system
spec:
  cron: "0 3 * * 0"
  task: backup
  retain: 8
  concurrency: 1
  groups:
    - db-volumes
  labels:
    tier: database
---
# Assign recurring jobs to a volume via labels
# Label the PVC or volume with: recurring-job-group.longhorn.io/db-volumes: enabled
kubectl label pvc pg-data-postgresql-0 -n database \
  recurring-job-group.longhorn.io/db-volumes=enabled

Disaster Recovery

Setting Up DR Volumes

# On the DR cluster: create a DR volume from the primary's backup
# This is done via the Longhorn UI or API

# 1. Configure the DR cluster's backup target to point to the same S3 bucket
kubectl -n longhorn-system edit settings.longhorn.io backup-target
# value: s3://longhorn-backups@eu-west-1/

# 2. Create a DR volume from the latest backup
# Via Longhorn UI: Backup → Select volume → Create DR Volume
# Via kubectl:
apiVersion: longhorn.io/v1beta2
kind: Volume
metadata:
  name: pg-data-dr
  namespace: longhorn-system
spec:
  size: "107374182400"  # 100Gi in bytes
  numberOfReplicas: 3
  fromBackup: "s3://longhorn-backups@eu-west-1/?backup=backup-abc123&volume=pg-data"
  standby: true
  frontend: ""

# 3. The DR volume auto-syncs incremental backups from S3
# Monitor sync status:
kubectl -n longhorn-system get volumes.longhorn.io pg-data-dr -o jsonpath='{.status.lastBackup}'

# 4. During failover — activate the DR volume:
kubectl -n longhorn-system patch volumes.longhorn.io pg-data-dr \
  --type merge -p '{"spec":{"standby":false,"frontend":"blockdev"}}'

# 5. Create a PVC bound to the activated DR volume
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pg-data-postgresql-0
  namespace: database
spec:
  storageClassName: longhorn-db
  volumeName: pg-data-dr
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

Volume Encryption

# Create a Kubernetes secret with the encryption passphrase
kubectl create secret generic longhorn-crypto \
  -n longhorn-system \
  --from-literal=CRYPTO_KEY_VALUE="a-very-long-and-random-passphrase-here" \
  --from-literal=CRYPTO_KEY_PROVIDER=secret \
  --from-literal=CRYPTO_KEY_CIPHER=aes-xts-plain64 \
  --from-literal=CRYPTO_KEY_HASH=sha256 \
  --from-literal=CRYPTO_KEY_SIZE=256 \
  --from-literal=CRYPTO_PBKDF=argon2i

# StorageClass with encryption enabled
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-encrypted
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  encrypted: "true"
  csi.storage.k8s.io/provisioner-secret-name: longhorn-crypto
  csi.storage.k8s.io/provisioner-secret-namespace: longhorn-system
  csi.storage.k8s.io/node-publish-secret-name: longhorn-crypto
  csi.storage.k8s.io/node-publish-secret-namespace: longhorn-system
  csi.storage.k8s.io/node-stage-secret-name: longhorn-crypto
  csi.storage.k8s.io/node-stage-secret-namespace: longhorn-system

ReadWriteMany (RWX) Support

# PVC with ReadWriteMany access mode
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-media
  namespace: application
spec:
  storageClassName: longhorn-rwx
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi

Database Workloads on Longhorn

PostgreSQL StatefulSet with Longhorn

# PostgreSQL StatefulSet using Longhorn PVC
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgresql
  namespace: database
spec:
  serviceName: postgresql
  replicas: 3
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      terminationGracePeriodSeconds: 120
      securityContext:
        fsGroup: 999
        runAsUser: 999
      containers:
        - name: postgresql
          image: postgres:16-alpine
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_DB
              value: production
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: pg-credentials
                  key: username
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: pg-credentials
                  key: password
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
          resources:
            requests:
              cpu: "1"
              memory: 2Gi
            limits:
              cpu: "4"
              memory: 8Gi
          volumeMounts:
            - name: pg-data
              mountPath: /var/lib/postgresql/data
          livenessProbe:
            exec:
              command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
            initialDelaySeconds: 5
            periodSeconds: 5
  volumeClaimTemplates:
    - metadata:
        name: pg-data
        labels:
          recurring-job-group.longhorn.io/db-volumes: enabled
      spec:
        storageClassName: longhorn-db
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi

MySQL StatefulSet with Longhorn

# MySQL StatefulSet using Longhorn PVC
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: database
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: mysql
          image: mysql:8.0
          ports:
            - containerPort: 3306
          env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-credentials
                  key: root-password
            - name: MYSQL_DATABASE
              value: production
          args:
            - "--default-authentication-plugin=mysql_native_password"
            - "--innodb-buffer-pool-size=4G"
            - "--innodb-log-file-size=1G"
            - "--innodb-flush-log-at-trx-commit=1"
            - "--sync-binlog=1"
          resources:
            requests:
              cpu: "1"
              memory: 4Gi
            limits:
              cpu: "4"
              memory: 8Gi
          volumeMounts:
            - name: mysql-data
              mountPath: /var/lib/mysql
  volumeClaimTemplates:
    - metadata:
        name: mysql-data
        labels:
          recurring-job-group.longhorn.io/db-volumes: enabled
      spec:
        storageClassName: longhorn-db
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200Gi

Performance Tuning for Database Workloads

# Configure dedicated disks on a node
# Via kubectl (Longhorn node CRD)
kubectl -n longhorn-system edit nodes.longhorn.io worker-01

# Add the dedicated disk under spec.disks:
spec:
  disks:
    default-disk:
      allowScheduling: false  # disable OS disk for Longhorn
      path: /var/lib/longhorn/
      storageReserved: 0
    nvme-data:
      allowScheduling: true
      path: /mnt/nvme-longhorn/
      storageReserved: 10737418240  # 10Gi reserved
      tags:
        - nvme
        - database

Node Scheduling and Disk Management

# Tag nodes for storage scheduling
kubectl label node worker-01 node.longhorn.io/storage=nvme
kubectl label node worker-02 node.longhorn.io/storage=nvme
kubectl label node worker-03 node.longhorn.io/storage=nvme
kubectl label node worker-04 node.longhorn.io/storage=sata

# Use node selector in StorageClass to target NVMe nodes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-nvme
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
  numberOfReplicas: "3"
  nodeSelector: "node.longhorn.io/storage=nvme"
  diskSelector: "nvme,database"
  dataLocality: best-effort

Monitoring with Prometheus and Grafana

Longhorn exposes Prometheus metrics through a built-in metrics endpoint. Monitoring these metrics is essential for capacity planning, performance analysis, and proactive alerting.

# ServiceMonitor for Longhorn metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: longhorn-prometheus
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: longhorn-manager
  namespaceSelector:
    matchNames:
      - longhorn-system
  endpoints:
    - port: manager
      path: /metrics
      interval: 30s
---
# PrometheusRule for Longhorn alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: longhorn-alerts
  namespace: monitoring
spec:
  groups:
    - name: longhorn-storage
      rules:
        - alert: LonghornVolumeStatusCritical
          expr: longhorn_volume_robustness == 3
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Longhorn volume {{ $labels.volume }} is Faulted"
        - alert: LonghornVolumeStatusDegraded
          expr: longhorn_volume_robustness == 2
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Longhorn volume {{ $labels.volume }} is Degraded"
        - alert: LonghornNodeStorageWarning
          expr: |
            (longhorn_node_storage_usage_bytes / longhorn_node_storage_capacity_bytes) > 0.80
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Longhorn node {{ $labels.node }} storage at {{ $value | humanizePercentage }}"
        - alert: LonghornBackupFailed
          expr: longhorn_backup_state == 3
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Longhorn backup failed for volume {{ $labels.volume }}"

k3s Cluster with Complete Longhorn Storage Stack

The following diagram shows how Longhorn fits into a complete k3s production stack, from bare metal servers through to application pods, with Rancher management and Prometheus/Grafana observability.

Comparison with Other Kubernetes Storage Solutions

Longhorn is not the only storage option for Kubernetes. Understanding how it compares to alternatives helps you make the right choice for your specific requirements.

Feature	Longhorn	Rook-Ceph	OpenEBS (Mayastor)	local-path
Complexity	Low	High	Medium	Minimal
Replication	Built-in (2–3 replicas)	CRUSH algorithm	NVMe-oF replicas	None
Dynamic PVC Expansion	Yes (online)	Yes	Yes	No
Snapshots	COW snapshots	RBD snapshots	Yes	No
Backup to Cloud	Built-in (S3/GCS/Azure)	Via rbd export	Via Velero	No
DR Volumes	Native standby volumes	RBD mirroring	No	No
RWX Support	Yes (NFS-based)	Yes (CephFS)	No	No
Volume Encryption	LUKS2	dmcrypt	No	No
UI	Built-in web UI	Ceph dashboard	Minimal	None
Min Nodes	1 (3 for HA)	3 (dedicated OSD nodes)	3	1
Resource Overhead	Low-Medium	High	Medium	None
Best For	k3s/RKE2 production	Large-scale enterprise	High-performance NVMe	Dev/single-node

Production Best Practices

These recommendations are distilled from operating Longhorn across dozens of production k3s clusters running database workloads.

Cloud-Specific Deployment Guidance

AWS — EC2 with NVMe Instance Storage

# On each EC2 i3.xlarge instance
sudo mkfs.ext4 /dev/nvme0n1
sudo mkdir -p /mnt/longhorn-nvme
sudo mount /dev/nvme0n1 /mnt/longhorn-nvme
echo '/dev/nvme0n1 /mnt/longhorn-nvme ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab

# Configure as Longhorn disk (via node CRD or UI)
# Path: /mnt/longhorn-nvme/

Azure — VMs with Ultra Disk

# Attach Ultra Disk to Azure VM (Azure CLI)
az vm disk attach \
  --vm-name k3s-worker-01 \
  --resource-group k3s-cluster \
  --name longhorn-ultra-01 \
  --size-gb 512 \
  --sku UltraSSD_LRS \
  --disk-iops-read-write 10000 \
  --disk-mbps-read-write 300 \
  --new

GCP — VMs with Local SSD

GCP offers Local SSD attached to n2-standard or c3-standard VMs. Local SSDs provide 375 GB per disk with up to 680,000 read IOPS. Attach multiple Local SSDs and RAID them for larger volumes.

# Create VM with local SSDs (gcloud)
gcloud compute instances create k3s-worker-01 \
  --machine-type=n2-standard-8 \
  --local-ssd=interface=NVME \
  --local-ssd=interface=NVME \
  --zone=europe-west1-b

# RAID two local SSDs for 750GB Longhorn disk
sudo mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme0n2
sudo mkfs.ext4 /dev/md0
sudo mkdir -p /mnt/longhorn-nvme
sudo mount /dev/md0 /mnt/longhorn-nvme

On-Premises Datacenter

# Typical on-premises server spec for Longhorn k3s node
# CPU: 8+ cores (Xeon or EPYC)
# RAM: 32+ GB
# OS Disk: 256GB SATA SSD (mirrored)
# Longhorn Disk: 2x 1TB NVMe (RAID-1 or individual Longhorn disks)
# Network: 2x 25GbE (bonded, one for application, one for storage)

# Configure dedicated storage network (optional but recommended)
# In Longhorn settings:
# storage-network: kube-system/storage-network-attachment

Longhorn UI Walkthrough

# Quick access via port-forward
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80
# Open http://localhost:8080 in your browser

Troubleshooting Common Issues

Degraded Volumes

A volume enters Degraded state when one or more replicas are unhealthy but the volume is still functional. Common causes include node failure, disk full, or network partition.

# Check volume status
kubectl -n longhorn-system get volumes.longhorn.io -o wide

# Describe the volume for detailed status
kubectl -n longhorn-system describe volumes.longhorn.io pvc-abc123

# Check replica status
kubectl -n longhorn-system get replicas.longhorn.io -l longhornvolume=pvc-abc123

# If a replica is stuck in error state, delete it to trigger rebuild
kubectl -n longhorn-system delete replicas.longhorn.io pvc-abc123-r-12345

# Monitor rebuild progress
kubectl -n longhorn-system get volumes.longhorn.io pvc-abc123 \
  -o jsonpath='{.status.conditions}' | jq

Volume Stuck in Attaching/Detaching

# Check engine status
kubectl -n longhorn-system get engines.longhorn.io -l longhornvolume=pvc-abc123

# Check for stuck VolumeAttachment objects
kubectl get volumeattachments | grep pvc-abc123

# Force detach a stuck volume (use cautiously)
kubectl -n longhorn-system patch volumes.longhorn.io pvc-abc123 \
  --type merge -p '{"spec":{"nodeID":""}}'

# If attachment is truly stuck, delete the VolumeAttachment
kubectl delete volumeattachment csi-abc123def456

Space Pressure

# Check node storage usage
kubectl -n longhorn-system get nodes.longhorn.io -o wide

# Identify large snapshots consuming space
kubectl -n longhorn-system get snapshots.longhorn.io -l longhornvolume=pvc-abc123

# Purge old snapshots to reclaim space
# Via Longhorn UI: Volume → Snapshots → Delete unneeded snapshots
# Old snapshots consume space because COW data cannot be freed until the snapshot is deleted

# Check for snapshot data integrity issues
kubectl -n longhorn-system get settings.longhorn.io snapshot-data-integrity

Upgrade Procedures

Longhorn upgrades should be performed carefully, as the storage system underpins all stateful workloads. Always take backups of all critical volumes before upgrading.

# 1. Back up all volumes before upgrade
for vol in $(kubectl -n longhorn-system get volumes.longhorn.io -o jsonpath='{.items[*].metadata.name}'); do
  echo "Backing up $vol"
  kubectl -n longhorn-system patch volumes.longhorn.io $vol \
    --type merge -p '{"spec":{"lastBackupAt":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'"}}'
done

# 2. Check current version
kubectl -n longhorn-system get daemonset longhorn-manager -o jsonpath='{.spec.template.spec.containers[0].image}'

# 3. Upgrade via Helm
helm repo update
helm upgrade longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --version 1.7.2 \
  --values longhorn-values.yaml

# 4. Monitor the rolling upgrade
kubectl -n longhorn-system rollout status daemonset/longhorn-manager
kubectl -n longhorn-system rollout status deployment/longhorn-driver-deployer

# 5. Verify all volumes are healthy after upgrade
kubectl -n longhorn-system get volumes.longhorn.io -o custom-columns=NAME:.metadata.name,STATE:.status.state,ROBUSTNESS:.status.robustness

# 6. Longhorn automatically upgrades engines — monitor progress
kubectl -n longhorn-system get engines.longhorn.io -o wide

Integration with Database Operators

# CloudNativePG cluster using Longhorn storage
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: production-pg
  namespace: database
spec:
  instances: 3
  storage:
    size: 100Gi
    storageClass: longhorn-db
    pvcTemplate:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
  walStorage:
    size: 20Gi
    storageClass: longhorn-fast

  postgresql:
    parameters:
      shared_buffers: "2GB"
      effective_cache_size: "6GB"
      maintenance_work_mem: "512MB"
      wal_buffers: "64MB"
      max_connections: "200"

  backup:
    barmanObjectStore:
      destinationPath: s3://pg-backups/cnpg/
      endpointURL: https://s3.eu-west-1.amazonaws.com
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: SECRET_ACCESS_KEY
      wal:
        compression: gzip
    retentionPolicy: "30d"

  resources:
    requests:
      cpu: "2"
      memory: 4Gi
    limits:
      cpu: "4"
      memory: 8Gi

# Percona XtraDB Cluster with Longhorn storage
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
  name: production-mysql
  namespace: database
spec:
  crVersion: "1.14.0"
  pxc:
    size: 3
    image: percona/percona-xtradb-cluster:8.0
    resources:
      requests:
        cpu: "2"
        memory: 4Gi
    volumeSpec:
      persistentVolumeClaim:
        storageClassName: longhorn-db
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200Gi
  backup:
    image: percona/percona-xtradb-cluster-operator:1.14.0-pxc8.0-backup-pxb8.0.35
    storages:
      s3-backup:
        type: s3
        s3:
          bucket: mysql-backups
          region: eu-west-1
          credentialsSecret: aws-creds
    schedule:
      - name: daily-full
        schedule: "0 2 * * *"
        keep: 14
        storageName: s3-backup

Longhorn Storage for Production k3s Clusters: Dynamic PVC Expansion, Replication, and Disaster Recovery

Longhorn Architecture

Installing Longhorn on k3s

Prerequisites

Installation via Helm (Recommended)

Installation via Rancher App Marketplace

Installation via kubectl

Post-Installation: Make Longhorn the Default StorageClass

StorageClass Configuration for Production

Dynamic PVC Expansion

Enabling Volume Expansion in StorageClass

Step-by-Step: Expanding a PVC Dynamically

Automating PVC Expansion with Alerts

Volume Replication and Data Locality

Snapshots and Backups

Configuring the Backup Target

VolumeSnapshot Class and Snapshot YAML

Recurring Backup Schedules

Disaster Recovery

Setting Up DR Volumes

Volume Encryption

ReadWriteMany (RWX) Support

Database Workloads on Longhorn

PostgreSQL StatefulSet with Longhorn

MySQL StatefulSet with Longhorn

Performance Tuning for Database Workloads

Node Scheduling and Disk Management

Monitoring with Prometheus and Grafana

k3s Cluster with Complete Longhorn Storage Stack

Comparison with Other Kubernetes Storage Solutions

Production Best Practices

Cloud-Specific Deployment Guidance

AWS — EC2 with NVMe Instance Storage

Azure — VMs with Ultra Disk

GCP — VMs with Local SSD

On-Premises Datacenter

Longhorn UI Walkthrough

Troubleshooting Common Issues

Degraded Volumes

Volume Stuck in Attaching/Detaching

Space Pressure

Upgrade Procedures

Integration with Database Operators

Conclusion

Longhorn Storage for Production k3s Clusters: Dynamic PVC Expansion, Replication, and Disaster Recovery

Longhorn Architecture

Installing Longhorn on k3s

Prerequisites

Installation via Helm (Recommended)

Installation via Rancher App Marketplace

Installation via kubectl

Post-Installation: Make Longhorn the Default StorageClass

StorageClass Configuration for Production

Dynamic PVC Expansion

Enabling Volume Expansion in StorageClass

Step-by-Step: Expanding a PVC Dynamically

Automating PVC Expansion with Alerts

Volume Replication and Data Locality

Snapshots and Backups

Configuring the Backup Target

VolumeSnapshot Class and Snapshot YAML

Recurring Backup Schedules

Disaster Recovery

Setting Up DR Volumes

Volume Encryption

ReadWriteMany (RWX) Support

Database Workloads on Longhorn

PostgreSQL StatefulSet with Longhorn

MySQL StatefulSet with Longhorn

Performance Tuning for Database Workloads

Node Scheduling and Disk Management

Monitoring with Prometheus and Grafana

k3s Cluster with Complete Longhorn Storage Stack

Comparison with Other Kubernetes Storage Solutions

Production Best Practices

Cloud-Specific Deployment Guidance

AWS — EC2 with NVMe Instance Storage

Azure — VMs with Ultra Disk

GCP — VMs with Local SSD

On-Premises Datacenter