Scaling Infrastructure with Kubernetes RKE2: Autoscaling, HA & Observability

Running a Kubernetes cluster in production is one thing. Running one that can absorb unpredictable traffic spikes, survive control-plane failures, enforce tenant isolation, and give your operations team clear visibility into every layer of the system — that is an entirely different challenge. RKE2, Rancher's next-generation Kubernetes distribution, is built specifically for environments where those requirements are non-negotiable.

This article works through the full lifecycle of a production-grade RKE2 deployment: initial cluster architecture, autoscaling at the pod and node level, high-availability control planes, resource governance, and observability with Prometheus and Grafana.

Why RKE2?

RKE2 distinguishes itself from upstream Kubernetes and from its predecessor RKE1 in three key areas. First, it ships with a CIS Kubernetes Benchmark-hardened configuration out of the box — admission controllers, audit logging, pod security, and TLS settings are pre-configured to pass a CIS Level 1 scan without manual intervention. Second, it is FIPS 140-2 compliant, making it suitable for government and regulated-industry deployments. Third, it embeds containerd directly and ships with its own CNI (Canal or Cilium depending on your configuration choice), reducing the surface area of external dependencies you need to manage.

RKE2 is also air-gap friendly. The installation bundle includes all required container images, which matters enormously in on-premises and edge deployments where internet access from cluster nodes is restricted or impossible.

Cluster Architecture

A production RKE2 cluster is divided into server nodes (which run the control plane and etcd) and agent nodes (which run workloads). The recommended topology for high availability is three or five server nodes and a variable number of agent nodes organised into node pools by workload class.

# /etc/rancher/rke2/config.yaml (server node)
token: <shared-cluster-token>
tls-san:
  - 10.0.0.10          # VIP or load balancer address
  - k8s.internal.example.com
cni: cilium
cluster-cidr: 10.42.0.0/16
service-cidr: 10.43.0.0/16
etcd-expose-metrics: true
kube-apiserver-arg:
  - "audit-log-path=/var/log/kubernetes/audit.log"
  - "audit-log-maxage=30"
  - "audit-log-maxsize=100"

# /etc/rancher/rke2/config.yaml (agent node)
server: https://10.0.0.10:9345
token: <shared-cluster-token>
node-label:
  - "workload-class=general"
  - "topology.kubernetes.io/zone=eu-west-1a"

Install the server on your first control-plane node, then join the remaining server nodes and all agent nodes using the same token and the VIP address. RKE2 automatically elects etcd leaders and manages quorum.

# Install and start RKE2 server
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
systemctl enable --now rke2-server.service

# Retrieve the node token for joining additional nodes
cat /var/lib/rancher/rke2/server/node-token

# Install and start RKE2 agent (on worker nodes)
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=agent sh -
systemctl enable --now rke2-agent.service

Node Pools and Workload Placement

Not all workloads have the same resource profile. Stateless web services have different requirements from GPU inference jobs, memory-intensive analytics workloads, or latency-sensitive databases. Organising agent nodes into pools with distinct labels and taints lets Kubernetes schedule each workload class onto appropriately-sized hardware.

# Label a node pool for memory-intensive workloads
kubectl label nodes worker-mem-{1..4} workload-class=memory-optimised
kubectl taint nodes worker-mem-{1..4} workload-class=memory-optimised:NoSchedule

# Label a separate pool for general compute
kubectl label nodes worker-gen-{1..8} workload-class=general

# Deployment targeting the memory-optimised pool
apiVersion: apps/v1
kind: Deployment
metadata:
  name: analytics-engine
spec:
  template:
    spec:
      nodeSelector:
        workload-class: memory-optimised
      tolerations:
        - key: workload-class
          operator: Equal
          value: memory-optimised
          effect: NoSchedule
      containers:
        - name: analytics
          image: registry.internal/analytics:v2.3.1
          resources:
            requests:
              memory: "8Gi"
              cpu: "2"
            limits:
              memory: "16Gi"
              cpu: "4"

Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) adjusts the replica count of a Deployment or StatefulSet based on observed metrics. CPU utilisation is the classic trigger, but modern HPA configurations can also scale on custom metrics exposed by your application or on external metrics from sources like a message queue depth.

First, ensure the Metrics Server is running — RKE2 does not bundle it by default.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # wait 5 minutes before scaling down
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 30

The behavior block is critical for stability. Without a scale-down stabilisation window, a brief traffic drop will remove pods prematurely, leaving you under-provisioned when load returns. The asymmetric policy — aggressive scale-up, conservative scale-down — is the right default for most production workloads.

Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) right-sizes the CPU and memory requests on individual pods based on observed usage. It addresses a common problem: developers set initial resource requests based on guesswork, and those values never get updated, leading either to wasteful over-provisioning or to OOMKilled pods under load.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: worker-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: background-worker
  updatePolicy:
    updateMode: "Auto"     # or "Off" to only view recommendations
  resourcePolicy:
    containerPolicies:
      - containerName: worker
        minAllowed:
          cpu: 100m
          memory: 256Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]

Note that VPA in Auto mode will evict and restart pods to apply new resource values. For services where in-flight requests cannot be interrupted, run VPA in Off mode to generate recommendations that you apply manually or through a GitOps workflow during maintenance windows.

Important: HPA and VPA should not manage the same resource (CPU or memory) on the same deployment simultaneously. Use HPA for CPU-driven horizontal scaling and VPA in Off mode for memory right-sizing, or use KEDA for event-driven scaling where fine-grained control is needed.

Cluster Autoscaler

Pod autoscalers work within the existing node capacity. When that capacity is exhausted — pods are stuck in Pending because no node has sufficient resources — you need the Cluster Autoscaler to provision new nodes. Conversely, when nodes are significantly under-utilised, the Cluster Autoscaler can drain and decommission them to reduce infrastructure cost.

On bare-metal or on-premises deployments, the Cluster Autoscaler integrates with your infrastructure provisioning layer. For cloud deployments, providers such as AWS, GCP, and Azure offer native node group integrations. The following example shows the core configuration for an AWS Auto Scaling Group.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
          command:
            - ./cluster-autoscaler
            - --cloud-provider=aws
            - --nodes=2:10:k8s-general-worker-asg
            - --nodes=1:4:k8s-memory-worker-asg
            - --scale-down-delay-after-add=10m
            - --scale-down-unneeded-time=10m
            - --scale-down-utilization-threshold=0.5
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
          env:
            - name: AWS_REGION
              value: eu-west-1

The --expander=least-waste option tells the autoscaler to prefer the node group that would have the smallest amount of unused resource after accommodating the pending pod, which minimises cost. Alternative expanders include random, most-pods, and priority.

High Availability Control Plane

A three-node control plane with embedded etcd is the minimum viable HA topology. etcd requires quorum — a majority of members must be healthy for the cluster to accept writes. With three members you can tolerate one failure; with five members you can tolerate two.

The control-plane nodes must sit behind a load balancer. For cloud deployments, a TCP load balancer targeting port 6443 (kube-apiserver) and 9345 (RKE2 registration) works well. On-premises deployments commonly use keepalived with a virtual IP address.

# keepalived.conf on control-plane nodes
vrrp_instance VI_1 {
    state MASTER        # BACKUP on the other two nodes
    interface eth0
    virtual_router_id 51
    priority 100        # 90 and 80 on the other two nodes
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass securepassword
    }
    virtual_ipaddress {
        10.0.0.10/24    # VIP used in tls-san and agent server address
    }
}

Validate that etcd is healthy after any control-plane operation. RKE2 bundles etcdctl at /var/lib/rancher/rke2/bin/etcdctl.

ETCDCTL_API=3 /var/lib/rancher/rke2/bin/etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
  --cert=/var/lib/rancher/rke2/server/tls/etcd/client.crt \
  --key=/var/lib/rancher/rke2/server/tls/etcd/client.key \
  endpoint health --cluster

Resource Quotas and Limit Ranges

In multi-tenant clusters — where different teams or applications share the same physical infrastructure — ResourceQuotas and LimitRanges are essential guardrails. ResourceQuotas set hard caps on total resource consumption within a namespace. LimitRanges set default and maximum values for individual containers, preventing a misconfigured deployment from requesting unbounded resources.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    count/deployments.apps: "20"
    count/services: "15"
    persistentvolumeclaims: "10"
    requests.storage: 500Gi

apiVersion: v1
kind: LimitRange
metadata:
  name: team-alpha-limits
  namespace: team-alpha
spec:
  limits:
    - type: Container
      default:
        cpu: 500m
        memory: 512Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      max:
        cpu: "8"
        memory: 16Gi
    - type: PersistentVolumeClaim
      max:
        storage: 100Gi

Applying LimitRanges ensures that developers who forget to specify resource requests still get sensible defaults rather than requesting zero CPU, which would cause the scheduler to place the pod anywhere and potentially starve other workloads on the same node.

Monitoring with Prometheus and Grafana

Observability in a Kubernetes cluster has three pillars: metrics, logs, and traces. Prometheus handles metrics collection; Grafana handles visualisation. The kube-prometheus-stack Helm chart deploys the entire stack — Prometheus Operator, Alert Manager, Grafana, node exporters, and a comprehensive set of pre-built dashboards — in a single command.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=longhorn \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi \
  --set grafana.adminPassword=<secure-password> \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi

RKE2 exposes etcd metrics when etcd-expose-metrics: true is set in the server config. Add a ServiceMonitor so Prometheus scrapes them.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: rke2-etcd
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  namespaceSelector:
    matchNames: [kube-system]
  selector:
    matchLabels:
      app.kubernetes.io/name: rke2-etcd
  endpoints:
    - port: metrics
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
        certFile: /etc/prometheus/secrets/etcd-client-cert/client.crt
        keyFile: /etc/prometheus/secrets/etcd-client-cert/client.key

Essential Alerting Rules

Pre-built dashboards are a starting point, but custom alerting rules tuned to your environment are what allow on-call engineers to act before users notice a problem.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: workload-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: pod-health
      rules:
        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[10m]) > 0.5
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash-looping"

        - alert: NodeMemoryPressure
          expr: |
            (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.1
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "Node {{ $labels.node }} memory below 10%"

        - alert: HPAMaxedOut
          expr: |
            kube_horizontalpodautoscaler_status_current_replicas
            == kube_horizontalpodautoscaler_spec_max_replicas
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} at maximum replicas"

The HPAMaxedOut alert is particularly valuable in practice. When an HPA is pinned at its maximum for an extended period it means traffic has outgrown your current ceiling. You either need to raise the maximum or add capacity to the node pool — and you want to know about that before the next spike, not during it.

Production Best Practices

Pod Disruption Budgets

A PodDisruptionBudget (PDB) constrains how many pods in a deployment can be simultaneously unavailable during voluntary disruptions like node drains. Without PDBs, draining a node for maintenance can take an entire deployment offline.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  minAvailable: 2       # or use maxUnavailable: 1
  selector:
    matchLabels:
      app: api-server

Topology Spread Constraints

By default, the scheduler spreads replicas across nodes using a best-effort algorithm. Topology spread constraints give you hard guarantees that replicas are distributed across availability zones or racks.

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: api-server

Upgrade Strategy

RKE2 supports rolling upgrades via the System Upgrade Controller. You define a Plan that targets server or agent nodes and specifies the target version; the controller drains, upgrades, and uncordons nodes sequentially.

apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: rke2-server-upgrade
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
      - { key: node-role.kubernetes.io/control-plane, operator: In, values: ["true"] }
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/rke2-upgrade
  version: v1.29.4+rke2r1

etcd Backup and Restore

RKE2 can take scheduled etcd snapshots automatically. Ensure they are written to durable storage outside the cluster — an S3 bucket or a remote NFS mount — rather than local disk on the control-plane nodes.

# /etc/rancher/rke2/config.yaml additions for automated snapshots
etcd-snapshot-schedule-cron: "0 */6 * * *"    # every 6 hours
etcd-snapshot-retention: 10
etcd-snapshot-dir: /mnt/nfs/etcd-snapshots

# Manual snapshot
rke2 etcd-snapshot save --name pre-upgrade-$(date +%Y%m%d)

# Restore from snapshot (run on a single server node with cluster stopped)
rke2 server --cluster-reset --cluster-reset-restore-path=/path/to/snapshot.db

Conclusion

Scaling infrastructure with RKE2 is not a single configuration change — it is a system of interlocking capabilities that must be designed and operated together. Horizontal pod autoscaling handles short-lived traffic bursts at the workload level. Vertical pod autoscaling keeps resource requests honest over time. The Cluster Autoscaler ensures the underlying node capacity tracks the aggregate demand of your pod autoscalers. Node pools and topology constraints ensure workloads land on the right hardware. Resource quotas and limit ranges protect tenants from each other. PodDisruptionBudgets and topology spread constraints harden availability. And Prometheus with Grafana gives your team the visibility to detect degradation before it becomes an outage.

RKE2 earns its place in production precisely because it ships a substantial portion of this stack pre-hardened and pre-integrated. Your responsibility is to understand the knobs, tune them to your workload characteristics, and build the operational discipline — runbooks, alert routing, upgrade cadence, backup validation — that turns a well-configured cluster into a genuinely reliable platform.

Why RKE2?

Cluster Architecture

# /etc/rancher/rke2/config.yaml (server node)
token: <shared-cluster-token>
tls-san:
  - 10.0.0.10          # VIP or load balancer address
  - k8s.internal.example.com
cni: cilium
cluster-cidr: 10.42.0.0/16
service-cidr: 10.43.0.0/16
etcd-expose-metrics: true
kube-apiserver-arg:
  - "audit-log-path=/var/log/kubernetes/audit.log"
  - "audit-log-maxage=30"
  - "audit-log-maxsize=100"

# /etc/rancher/rke2/config.yaml (agent node)
server: https://10.0.0.10:9345
token: <shared-cluster-token>
node-label:
  - "workload-class=general"
  - "topology.kubernetes.io/zone=eu-west-1a"

# Install and start RKE2 server
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
systemctl enable --now rke2-server.service

# Retrieve the node token for joining additional nodes
cat /var/lib/rancher/rke2/server/node-token

# Install and start RKE2 agent (on worker nodes)
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=agent sh -
systemctl enable --now rke2-agent.service

Node Pools and Workload Placement

# Label a node pool for memory-intensive workloads
kubectl label nodes worker-mem-{1..4} workload-class=memory-optimised
kubectl taint nodes worker-mem-{1..4} workload-class=memory-optimised:NoSchedule

# Label a separate pool for general compute
kubectl label nodes worker-gen-{1..8} workload-class=general

# Deployment targeting the memory-optimised pool
apiVersion: apps/v1
kind: Deployment
metadata:
  name: analytics-engine
spec:
  template:
    spec:
      nodeSelector:
        workload-class: memory-optimised
      tolerations:
        - key: workload-class
          operator: Equal
          value: memory-optimised
          effect: NoSchedule
      containers:
        - name: analytics
          image: registry.internal/analytics:v2.3.1
          resources:
            requests:
              memory: "8Gi"
              cpu: "2"
            limits:
              memory: "16Gi"
              cpu: "4"

Horizontal Pod Autoscaler

First, ensure the Metrics Server is running — RKE2 does not bundle it by default.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # wait 5 minutes before scaling down
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 30

Vertical Pod Autoscaler

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: worker-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: background-worker
  updatePolicy:
    updateMode: "Auto"     # or "Off" to only view recommendations
  resourcePolicy:
    containerPolicies:
      - containerName: worker
        minAllowed:
          cpu: 100m
          memory: 256Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]

Cluster Autoscaler

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
          command:
            - ./cluster-autoscaler
            - --cloud-provider=aws
            - --nodes=2:10:k8s-general-worker-asg
            - --nodes=1:4:k8s-memory-worker-asg
            - --scale-down-delay-after-add=10m
            - --scale-down-unneeded-time=10m
            - --scale-down-utilization-threshold=0.5
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
          env:
            - name: AWS_REGION
              value: eu-west-1

High Availability Control Plane

# keepalived.conf on control-plane nodes
vrrp_instance VI_1 {
    state MASTER        # BACKUP on the other two nodes
    interface eth0
    virtual_router_id 51
    priority 100        # 90 and 80 on the other two nodes
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass securepassword
    }
    virtual_ipaddress {
        10.0.0.10/24    # VIP used in tls-san and agent server address
    }
}

Validate that etcd is healthy after any control-plane operation. RKE2 bundles etcdctl at /var/lib/rancher/rke2/bin/etcdctl.

ETCDCTL_API=3 /var/lib/rancher/rke2/bin/etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
  --cert=/var/lib/rancher/rke2/server/tls/etcd/client.crt \
  --key=/var/lib/rancher/rke2/server/tls/etcd/client.key \
  endpoint health --cluster

Resource Quotas and Limit Ranges

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    count/deployments.apps: "20"
    count/services: "15"
    persistentvolumeclaims: "10"
    requests.storage: 500Gi

apiVersion: v1
kind: LimitRange
metadata:
  name: team-alpha-limits
  namespace: team-alpha
spec:
  limits:
    - type: Container
      default:
        cpu: 500m
        memory: 512Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      max:
        cpu: "8"
        memory: 16Gi
    - type: PersistentVolumeClaim
      max:
        storage: 100Gi

Monitoring with Prometheus and Grafana

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=longhorn \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi \
  --set grafana.adminPassword=<secure-password> \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi

RKE2 exposes etcd metrics when etcd-expose-metrics: true is set in the server config. Add a ServiceMonitor so Prometheus scrapes them.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: rke2-etcd
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  namespaceSelector:
    matchNames: [kube-system]
  selector:
    matchLabels:
      app.kubernetes.io/name: rke2-etcd
  endpoints:
    - port: metrics
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
        certFile: /etc/prometheus/secrets/etcd-client-cert/client.crt
        keyFile: /etc/prometheus/secrets/etcd-client-cert/client.key

Essential Alerting Rules

Pre-built dashboards are a starting point, but custom alerting rules tuned to your environment are what allow on-call engineers to act before users notice a problem.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: workload-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: pod-health
      rules:
        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[10m]) > 0.5
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash-looping"

        - alert: NodeMemoryPressure
          expr: |
            (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.1
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "Node {{ $labels.node }} memory below 10%"

        - alert: HPAMaxedOut
          expr: |
            kube_horizontalpodautoscaler_status_current_replicas
            == kube_horizontalpodautoscaler_spec_max_replicas
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} at maximum replicas"

Production Best Practices

Pod Disruption Budgets

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  minAvailable: 2       # or use maxUnavailable: 1
  selector:
    matchLabels:
      app: api-server

Topology Spread Constraints

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: api-server

Upgrade Strategy

apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: rke2-server-upgrade
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
      - { key: node-role.kubernetes.io/control-plane, operator: In, values: ["true"] }
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/rke2-upgrade
  version: v1.29.4+rke2r1

etcd Backup and Restore

# /etc/rancher/rke2/config.yaml additions for automated snapshots
etcd-snapshot-schedule-cron: "0 */6 * * *"    # every 6 hours
etcd-snapshot-retention: 10
etcd-snapshot-dir: /mnt/nfs/etcd-snapshots

# Manual snapshot
rke2 etcd-snapshot save --name pre-upgrade-$(date +%Y%m%d)

# Restore from snapshot (run on a single server node with cluster stopped)
rke2 server --cluster-reset --cluster-reset-restore-path=/path/to/snapshot.db

Scaling Infrastructure with Kubernetes and RKE2: A Production Deep Dive

Why RKE2?

Cluster Architecture

Node Pools and Workload Placement

Horizontal Pod Autoscaler

Vertical Pod Autoscaler

Cluster Autoscaler

High Availability Control Plane

Resource Quotas and Limit Ranges

Monitoring with Prometheus and Grafana

Essential Alerting Rules

Production Best Practices

Pod Disruption Budgets

Topology Spread Constraints

Upgrade Strategy

etcd Backup and Restore

Conclusion

Scaling Infrastructure with Kubernetes and RKE2: A Production Deep Dive

Why RKE2?

Cluster Architecture

Node Pools and Workload Placement

Horizontal Pod Autoscaler

Vertical Pod Autoscaler

Cluster Autoscaler

High Availability Control Plane

Resource Quotas and Limit Ranges

Monitoring with Prometheus and Grafana

Essential Alerting Rules

Production Best Practices

Pod Disruption Budgets

Topology Spread Constraints

Upgrade Strategy

etcd Backup and Restore

Conclusion