Skip to main content

Architecture Overview

This document provides a conceptual map of the Kubernerdes Enclave — what it is, why it's built the way it is, and how all the pieces relate to each other. For step-by-step deployment instructions, follow the Day 0Day 1Day 2 path in the sidebar.


What Is the Enclave?

The Kubernerdes Enclave is a self-hosted, air-gap-capable Kubernetes platform running on commodity Intel NUC hardware. It's designed for environments where cloud connectivity cannot be assumed — or where supply chain integrity, data sovereignty, and operational control matter more than elasticity.

"Air-gap ready" means the enclave is built to operate without outbound internet access after the initial provisioning phase. All software artifacts — container images, Helm charts, OS packages — are pulled from the internet once, cryptographically verified, and stored locally. From that point forward, every node in the cluster consumes only what's in its own local registry. No workload ever reaches out to Docker Hub, GitHub, or any external endpoint to pull an image.

This distinction matters: many platforms claim to support air-gapped operation, but leave the complexity of actually getting artifacts into the environment to the operator. This enclave treats artifact management as a first-class concern, not an afterthought.


The Software Stack

The enclave is built in layers, each one depending on the one below it.

Layer 1 — Infrastructure Services

Before any Kubernetes cluster can exist, the network must be stable and discoverable. A dedicated admin host (nuc-00) runs a set of KVM virtual machines that provide foundational network services:

  • DHCP — assigns IP addresses to all nodes as they come online
  • DNS — resolves cluster hostnames and API endpoints
  • HAProxy + Keepalived — distributes traffic across the Harvester cluster and Rancher Manager, presenting stable virtual IP addresses (VIPs) that survive individual node failures

These services are what allow the Harvester nodes to discover each other, boot over the network via PXE, and reach stable API endpoints regardless of which physical node is handling the request. They must be healthy before anything else can be built.

Layer 2 — Harvester HCI

Harvester is an open-source hyperconverged infrastructure (HCI) platform built by SUSE on top of Kubernetes. The three Harvester nodes (nuc-01, nuc-02, nuc-03) form a cluster that manages both virtual machines and Kubernetes workloads on the same underlying hardware — eliminating the need for a separate virtualization layer and a separate container platform.

Harvester is installed via PXE boot: the admin host serves the Harvester installer and per-node configuration over the network, enabling fully automated, unattended installation across all three nodes. The result is a three-node cluster with distributed storage (Longhorn) and a shared API surface, fronted by a VIP.

Harvester is the compute and storage substrate for everything above it. If Harvester is unhealthy, nothing else runs.

Layer 3 — Rancher Manager

Rancher Manager is SUSE's multi-cluster Kubernetes management platform. In this enclave, it runs as a highly available K3s cluster inside Harvester — three VMs, each scheduled on a different Harvester node, fronted by a Keepalived VIP. This placement means Rancher Manager survives the loss of any single Harvester node.

Rancher provides the unified control plane for the enclave: cluster lifecycle management, application deployment via Helm charts and the built-in app catalog, role-based access control, and the integration point for downstream security tooling like NeuVector. Operators interact with the enclave primarily through Rancher's UI or via kubectl pointed at Rancher's downstream cluster kubeconfigs.

Layer 4 — RGS Carbide and Hauler

RGS Carbide is Rancher Government Solutions' hardened distribution of the Rancher stack. It addresses a specific and often underestimated problem: how do you trust the container images running in your cluster?

A standard Kubernetes deployment pulls images from public registries with no integrity verification beyond a tag name — a tag that can be silently overwritten. Carbide solves this with two mechanisms:

  • Signed, FIPS-capable images. Every image in the Carbide registry (rgcrprod.azurecr.us) is signed with Cosign, creating a cryptographically verifiable provenance chain from build to deployment. Signatures are checked at pull time, not just at rest.
  • A curated, compliance-aligned image set. Carbide images are built and maintained to meet DISA STIG and FedRAMP-relevant standards, with documented CVE remediation and a defined release cadence.

Hauler is Carbide's companion artifact management tool. It's purpose-built for the air-gap problem:

  1. On a connected host, Hauler authenticates to the Carbide registry and pulls the specified images, Helm charts, and other artifacts into a local content-addressed store.
  2. That store is served over a standard HTTP endpoint (Apache on nuc-00), making it a drop-in replacement for any external registry or Helm repository.
  3. Cluster nodes and operators pull from this local endpoint — they have no knowledge of, or dependency on, any external registry.

Hauler is what makes the air-gap boundary real. Without it, the cluster would still need outbound internet access for every image pull. With it, the internet is only needed during the initial sync — and for periodic refresh cycles, which can be performed on a connected workstation and transferred in via removable media if necessary.

Layer 5 — NeuVector

NeuVector (SUSE Security) provides runtime container security, continuous vulnerability scanning, and network policy enforcement inside the cluster. It runs as a DaemonSet — inspecting container network traffic and workload behavior in real time.

NeuVector complements Carbide's supply-chain assurance: Carbide verifies that an image is what it claims to be at pull time; NeuVector monitors what the running container is actually doing. Together they provide defense-in-depth across the image lifecycle.


The Day 0 / 1 / 2 Framework

This documentation is organized around the standard operational lifecycle used in enterprise infrastructure:

PhaseFocusKey Question
Day 0 — DesignDecisions made before anything is installedWhat does this need to look like, and do we have everything we need?
Day 1 — BuildInitial deployment, strictly orderedIs each layer healthy before the next one begins?
Day 2 — OperateOngoing health, security, and maintenanceIs the enclave still doing what we designed it to do?

Mistakes made on Day 0 are the most expensive to fix — a wrong IP scheme or an undersized network segment can require rebuilding from scratch. Day 1 is strictly sequential: each component depends on the previous one being verified and healthy before proceeding. Day 2 is where the enclave proves its value over time.


Build Sequence

The following diagram shows the ordered dependency chain for initial deployment. Each step must pass its health checks before the next begins.

flowchart TD
D0["Day 0 — Design\nHardware · Network plan\nArtifact staging · Credentials"]

D0 --> A

subgraph DAY1["Day 1 — Build"]
A["Admin Host\nnuc-00\nopenSUSE Leap · KVM · Apache · TFTP · PXE"]
A --> B["Infrastructure VMs\nDHCP · DNS primary + replica\nHAProxy · Keepalived"]
A --> C["Hauler & Carbide\nSync images + charts\nto local store"]
B --> E
C --> E
E["Harvester Cluster\nnuc-01 · nuc-02 · nuc-03\nPXE boot → automated install → cluster join"]
E --> F["Rancher Manager\nK3s HA cluster inside Harvester\ncert-manager → Rancher Helm deploy"]
end

F --> D2

subgraph DAY2["Day 2 — Operate"]
D2["NeuVector · Monitoring\nObservability · Backup\nOngoing operations"]
end

Artifact Data Flow

The following diagram shows how software artifacts move from their upstream sources into — and through — the enclave. The air-gap boundary is the key concept: after initial provisioning, no node in the cluster requires outbound internet access.

flowchart LR
subgraph INT["Internet (Day 0 and initial Day 1 only)"]
CR["Carbide Registry\nrgcrprod.azurecr.us\nSigned OCI images"]
GH["Public Sources\nGitHub · Helm repos\nOS packages · ISOs"]
end

subgraph NUC00["nuc-00 — Admin Host"]
HS["Hauler Store\nLocal content-addressed\nartifact store"]
AP["Apache HTTP\nServes images + charts\nto cluster nodes"]
PX["TFTP / PXE\nHarvester boot\nassets + config"]
HS --> AP
end

subgraph ENCLAVE["Enclave — Air-Gap Boundary"]
HV["Harvester Cluster\nnuc-01 · 02 · 03"]
RM["Rancher Manager\nK3s HA cluster"]
NV["NeuVector\nRuntime security"]
WL["Workloads"]
HV --> RM
RM --> NV
RM --> WL
end

CR -->|"hauler store sync\nauthenticated · verified"| HS
GH -->|"ISOs · charts · binaries"| HS
GH -->|"Harvester ISO\niPXE config"| PX
AP -->|"OCI images\nHelm charts"| HV
AP -->|"OCI images\nHelm charts"| RM
PX -->|"PXE boot\nHarvester installer"| HV

After the Hauler store is populated and the cluster is running, all subsequent image pulls — for updates, new workloads, and security patches — come from the local store. Carbide credentials are only needed during the initial sync and any future refresh cycles. Those refreshes can be performed on a connected workstation and transferred into an offline environment if necessary, keeping the air-gap boundary intact.