Skip to main content

Monitoring

Dashboards Available

DashboardURLPurpose
Harvester UIhttps://10.10.12.100Cluster nodes, VMs, storage, networking
Rancher Managerhttps://10.10.12.210Multi-cluster overview, workload health
SUSE Observabilityhttps://10.10.12.220Full-stack topology, metrics, alerts
HAProxy Statshttp://10.10.12.93:9000/statsLoad balancer backend health, traffic
Longhorn UIhttps://10.10.12.100 → Storage → LonghornVolume health, replica status

Harvester Dashboard

Access the Harvester UI at https://10.10.12.100 (or https://harvester-edge.enclave.kubernerdes.com).

Key views:

  • Dashboard — cluster-wide CPU/memory/storage utilization
  • Hosts — per-node resource usage, disk health, network
  • Virtual Machines — running VMs, their state, and console access
  • Volumes — Longhorn PVC status, replica counts
  • Networks — VM network interfaces and bridge configuration

Node Health Check

From the Harvester UI → Hosts, each node should show:

  • State: Active
  • Disk: Schedulable
  • Memory: reasonable headroom (alert if > 85% used)

From the command line:

# Node conditions
kubectl get nodes -o custom-columns=\
'NAME:.metadata.name,STATUS:.status.conditions[-1].type,REASON:.status.conditions[-1].reason'

# Resource pressure
kubectl top nodes

Rancher Dashboard

Access at https://10.10.12.210 (or https://rancher.enclave.kubernerdes.com).

  • Cluster Explorer → select harvester-edge cluster → workload health
  • Monitoring → if you've deployed the rancher-monitoring chart, Grafana dashboards are available here
  • Fleet → GitOps-managed workloads across clusters

Enable Rancher Monitoring (Optional)

helm repo add rancher-charts https://charts.rancher.com
helm install rancher-monitoring rancher-charts/rancher-monitoring \
--namespace cattle-monitoring-system \
--create-namespace \
--kubeconfig ~/.kube/harvester-config \
--set prometheus.prometheusSpec.resources.requests.memory=512Mi \
--set prometheus.prometheusSpec.resources.limits.memory=2Gi

This deploys Prometheus + Grafana + Alertmanager into the Harvester cluster. Access Grafana via Rancher UI → Monitoring → Grafana.

HAProxy Stats Page

The HAProxy stats page provides real-time load balancer visibility.

Access at: http://10.10.12.93:9000/stats (credentials: admin/rancher)

Key metrics to monitor:

MetricHealthyAlert
Backend UP count= configured backend countAny backend DOWN
Session rateBaseline normalSudden spike
Error rate~0> 0.1%

From the command line (on nuc-00-03):

# Check HAProxy backend status via socket
echo "show stat" | socat stdio /var/run/haproxy/admin.sock | cut -d',' -f1,2,18,19

Key Metrics to Watch

Storage (Longhorn)

# Overall storage health
kubectl get volumes -n longhorn-system

# Degraded volumes (replicas not fully replicated)
kubectl get volumes -n longhorn-system \
-o custom-columns='NAME:.metadata.name,STATE:.status.state,ROBUSTNESS:.status.robustness' | \
grep -v healthy

# Disk space
kubectl get nodes.longhorn.io -n longhorn-system

Alert thresholds:

  • Volume robustness degraded: investigate within 24h
  • Volume robustness faulted: immediate action required
  • Disk usage > 80%: plan expansion or cleanup

etcd Health

Harvester's control plane uses etcd. Check its health periodically:

# SSH to any Harvester node
ssh rancher@10.10.12.101

# etcd health
kubectl get pods -n kube-system | grep etcd
crictl ps | grep etcd

# etcd endpoint health (from inside nuc-01)
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/server-client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/server-client.key \
endpoint health

Certificate Expiry

# Check all cert-manager certificates
kubectl --kubeconfig ~/.kube/enclave-rancher.kubeconfig \
get certificates -A

# Check expiry
kubectl --kubeconfig ~/.kube/enclave-rancher.kubeconfig \
get certificates -A -o \
custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,READY:.status.conditions[-1].status,EXPIRY:.status.notAfter'

Certificates managed by cert-manager renew automatically at 2/3 of their lifetime. If a certificate is stuck NotReady, see Troubleshooting.

SUSE Observability

SUSE Observability provides topology-based monitoring across all enclave clusters. Access it at https://observability.enclave.kubernerdes.com (VIP 10.10.12.220).

Key views:

  • Topology — live map of all services and their relationships across clusters
  • Monitors — configurable health checks with alerting thresholds
  • Metrics — time-series data from all registered cluster agents
  • Events — Kubernetes events and change history

For installation and cluster agent registration, see Observability.

Alerting

Basic alerting can be configured via:

  1. SUSE Observability Monitors — topology-aware alerts for service health, pod failures, node pressure
  2. Rancher Monitoring Alertmanager — email/Slack/PagerDuty alerts (if the rancher-monitoring chart is deployed)
  3. Keepalived — log to syslog when VIP transitions (visible in journalctl -u keepalived)
  4. HAProxy — log backend state changes to syslog
# Watch HAProxy state changes in real time (on nuc-00-03)
journalctl -u haproxy -f | grep -E "Server|backend"

# Watch Keepalived VIP transitions (on nuc-00-03)
journalctl -u keepalived -f