Monitoring
Dashboards Available
| Dashboard | URL | Purpose |
|---|---|---|
| Harvester UI | https://10.10.12.100 | Cluster nodes, VMs, storage, networking |
| Rancher Manager | https://10.10.12.210 | Multi-cluster overview, workload health |
| SUSE Observability | https://10.10.12.220 | Full-stack topology, metrics, alerts |
| HAProxy Stats | http://10.10.12.93:9000/stats | Load balancer backend health, traffic |
| Longhorn UI | https://10.10.12.100 → Storage → Longhorn | Volume health, replica status |
Harvester Dashboard
Access the Harvester UI at https://10.10.12.100 (or https://harvester-edge.enclave.kubernerdes.com).
Key views:
- Dashboard — cluster-wide CPU/memory/storage utilization
- Hosts — per-node resource usage, disk health, network
- Virtual Machines — running VMs, their state, and console access
- Volumes — Longhorn PVC status, replica counts
- Networks — VM network interfaces and bridge configuration
Node Health Check
From the Harvester UI → Hosts, each node should show:
- State:
Active - Disk:
Schedulable - Memory: reasonable headroom (alert if > 85% used)
From the command line:
# Node conditions
kubectl get nodes -o custom-columns=\
'NAME:.metadata.name,STATUS:.status.conditions[-1].type,REASON:.status.conditions[-1].reason'
# Resource pressure
kubectl top nodes
Rancher Dashboard
Access at https://10.10.12.210 (or https://rancher.enclave.kubernerdes.com).
- Cluster Explorer → select
harvester-edgecluster → workload health - Monitoring → if you've deployed the rancher-monitoring chart, Grafana dashboards are available here
- Fleet → GitOps-managed workloads across clusters
Enable Rancher Monitoring (Optional)
helm repo add rancher-charts https://charts.rancher.com
helm install rancher-monitoring rancher-charts/rancher-monitoring \
--namespace cattle-monitoring-system \
--create-namespace \
--kubeconfig ~/.kube/harvester-config \
--set prometheus.prometheusSpec.resources.requests.memory=512Mi \
--set prometheus.prometheusSpec.resources.limits.memory=2Gi
This deploys Prometheus + Grafana + Alertmanager into the Harvester cluster. Access Grafana via Rancher UI → Monitoring → Grafana.
HAProxy Stats Page
The HAProxy stats page provides real-time load balancer visibility.
Access at: http://10.10.12.93:9000/stats (credentials: admin/rancher)
Key metrics to monitor:
| Metric | Healthy | Alert |
|---|---|---|
| Backend UP count | = configured backend count | Any backend DOWN |
| Session rate | Baseline normal | Sudden spike |
| Error rate | ~0 | > 0.1% |
From the command line (on nuc-00-03):
# Check HAProxy backend status via socket
echo "show stat" | socat stdio /var/run/haproxy/admin.sock | cut -d',' -f1,2,18,19
Key Metrics to Watch
Storage (Longhorn)
# Overall storage health
kubectl get volumes -n longhorn-system
# Degraded volumes (replicas not fully replicated)
kubectl get volumes -n longhorn-system \
-o custom-columns='NAME:.metadata.name,STATE:.status.state,ROBUSTNESS:.status.robustness' | \
grep -v healthy
# Disk space
kubectl get nodes.longhorn.io -n longhorn-system
Alert thresholds:
- Volume robustness
degraded: investigate within 24h - Volume robustness
faulted: immediate action required - Disk usage > 80%: plan expansion or cleanup
etcd Health
Harvester's control plane uses etcd. Check its health periodically:
# SSH to any Harvester node
ssh rancher@10.10.12.101
# etcd health
kubectl get pods -n kube-system | grep etcd
crictl ps | grep etcd
# etcd endpoint health (from inside nuc-01)
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/server-client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/server-client.key \
endpoint health
Certificate Expiry
# Check all cert-manager certificates
kubectl --kubeconfig ~/.kube/enclave-rancher.kubeconfig \
get certificates -A
# Check expiry
kubectl --kubeconfig ~/.kube/enclave-rancher.kubeconfig \
get certificates -A -o \
custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,READY:.status.conditions[-1].status,EXPIRY:.status.notAfter'
Certificates managed by cert-manager renew automatically at 2/3 of their lifetime. If a certificate is stuck NotReady, see Troubleshooting.
SUSE Observability
SUSE Observability provides topology-based monitoring across all enclave clusters. Access it at https://observability.enclave.kubernerdes.com (VIP 10.10.12.220).
Key views:
- Topology — live map of all services and their relationships across clusters
- Monitors — configurable health checks with alerting thresholds
- Metrics — time-series data from all registered cluster agents
- Events — Kubernetes events and change history
For installation and cluster agent registration, see Observability.
Alerting
Basic alerting can be configured via:
- SUSE Observability Monitors — topology-aware alerts for service health, pod failures, node pressure
- Rancher Monitoring Alertmanager — email/Slack/PagerDuty alerts (if the rancher-monitoring chart is deployed)
- Keepalived — log to syslog when VIP transitions (visible in
journalctl -u keepalived) - HAProxy — log backend state changes to syslog
# Watch HAProxy state changes in real time (on nuc-00-03)
journalctl -u haproxy -f | grep -E "Server|backend"
# Watch Keepalived VIP transitions (on nuc-00-03)
journalctl -u keepalived -f