6. Runner Isolation on EKS

CI runners execute code the platform does not fully trust. That includes feature branch code, build scripts, third-party dependencies, package manager hooks, generated files, and test fixtures. Runner design is an isolation problem as much as a capacity problem.

The reference platform runs GitLab Runner on AWS EKS with separate runner tiers. Terraform creates namespaces, applies restrictive labels, installs the GitLab Runner Helm chart, and attaches network policies for each tier.

Runners in plain language

A GitLab runner is the worker that executes CI jobs. GitLab schedules the job, but the runner performs the work. If a pipeline says “build this container image” or “run these tests,” a runner is where those commands run.

That means a runner is exposed to whatever the job does. A job can run application code, install dependencies, execute package manager scripts, open network connections, and handle artifacts. Some jobs also need credentials. That combination makes runner design a security decision.

The design needs to answer:

Which jobs can run on which runners?
Which runners can access secrets?
Which runners can reach the internet?
Which runners can deploy?
Which runners are allowed to run privileged workloads?
How do we monitor runner health and queue time?

The safest default is unprivileged, ephemeral pods with restricted egress. A job should start clean, run with the least privilege it needs, publish its output, and disappear.

Why CI isolation belongs in the platform

If every repository chooses its own runner model, the organization cannot reason about which jobs can reach which secrets or networks. If all jobs share the same runner pool, a low-trust feature branch can end up too close to high-trust release automation.

The platform response is to make trust tiers explicit. Most jobs should run in constrained, non-privileged environments. Protected release jobs should run on runners with narrower registration, stronger network controls, and clearer monitoring. Exceptional privileged builds should be rare, documented, and separated from normal CI capacity.

Namespaces and pod security

The EKS runner stack labels each runner namespace with restricted pod security settings:

locals {
  namespace_labels = {
    "pod-security.kubernetes.io/enforce" = "restricted"
    "pod-security.kubernetes.io/audit"   = "restricted"
    "pod-security.kubernetes.io/warn"    = "restricted"
  }
}

resource "kubernetes_namespace_v1" "runner" {
  for_each = var.runner_tiers

  metadata {
    name   = each.value.namespace
    labels = merge(local.namespace_labels, each.value.labels)
  }
}

That does not make CI safe by itself, but it establishes the default: runner pods should not start from a privileged posture.

Helm releases by tier

Each runner tier becomes its own Helm release:

resource "helm_release" "gitlab_runner" {
  for_each = var.runner_tiers

  name       = "gitlab-runner-${each.key}"
  repository = "https://charts.gitlab.io"
  chart      = "gitlab-runner"
  version    = each.value.chart_version
  namespace  = kubernetes_namespace_v1.runner[each.key].metadata[0].name

  atomic          = true
  cleanup_on_fail = true
  wait            = true
  values          = [file("${path.module}/${each.value.values_file}")]
}

Separate releases make rollout and rollback cleaner. A sandbox runner change should not be coupled to a protected release runner change.

Standard runner baseline

The standard tier uses the Kubernetes executor, a pinned base image, locked runner registration, non-privileged execution, node selectors, and pod security context:

runners:
  name: standard
  tags: standard
  protected: false
  locked: true
  requestConcurrency: 25
  config: |
    [[runners]]
      executor = "kubernetes"
      [runners.kubernetes]
        namespace = "gitlab-runner-standard"
        image = "alpine:3.22.1"
        privileged = false
        service_account = "gitlab-runner-standard"
        [runners.kubernetes.node_selector]
          "platform.gitlab.com/runner-tier" = "standard"

Protected, privileged, deployment, and sandbox tiers can then vary from that baseline. Trust is explicit. Jobs do not accidentally inherit production deploy access because all runners share a registration.

Cloud access

Deployment jobs should avoid static cloud keys. GitLab OIDC ID tokens let a job assume AWS IAM roles without storing long-lived AWS credentials in project variables.

Use separate roles for separate environments:

feature and sandbox jobs receive no deployment role
non-production deployment jobs assume non-production roles
production deployment jobs assume production roles only from protected refs
production environments are protected in GitLab so only approved users or groups can deploy

That split keeps cloud access tied to both GitLab policy and AWS IAM policy. A compromised feature branch should not be able to reach a production role just because the runner can reach AWS.

Network policy

The default network posture is deny first:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

The baseline allow policy opens DNS and TCP 443. Production environments should usually make that more specific through egress gateways, proxies, or approved registry endpoints. The principle is the same: CI jobs should not have arbitrary network reach just because they run inside the platform cluster.

Operational considerations

Runner isolation is also an SRE concern. Queue latency, autoscaling behavior, failed pod scheduling, registry access errors, and runner token rotation all affect developer experience. Monitor runner tiers separately because a saturated standard tier and a failing deployment tier require different responses.

The security model and operations model meet in one place: a runner tier is a product surface. It needs clear trust boundaries, capacity management, dashboards, and a runbook.

The final page connects delivery evidence with ongoing platform operations.