8. Supply Chain and SRE

Secure delivery does not end when a pipeline turns green. A production platform also needs evidence that explains what was built, where it ran, which commit produced it, and whether the platform itself is healthy enough to trust.

This page connects two concerns that are often separated: software supply chain security and SRE operations.

Supply chain security in plain language

Software supply chain security is about the integrity of the path from source to production. Source code is one input, but it is not the only one. Dependencies, base images, build tools, CI runners, package registries, artifact stores, signing keys, deployment credentials, and release approvals all influence what eventually runs.

A supply chain control should help answer one of these questions:

Where did this artifact come from?
Which source commit produced it?
Which pipeline built it?
Which runner executed the build?
Which dependencies were included?
Was the artifact signed?
Can deployment verify the artifact by digest?
Was the platform healthy when the build ran?

Finding vulnerabilities is only part of the work. The larger goal is traceability: being able to explain how a change became a production artifact.

Prevention and evidence

Modern supply chain programs need both prevention and evidence.

Prevention reduces the chance of compromise. Examples include protected branches, required review, pinned dependencies, isolated runners, restricted credentials, mandatory scans, and policy checks.

Evidence helps during deployment, audit, and incident response. Examples include SBOMs, signatures, provenance, immutable artifact references, pipeline logs, runner identifiers, and approval records.

SLSA, Sigstore, SBOM formats, provenance, and immutable artifact references are useful because they turn delivery into something that can be inspected. They do not remove the need for engineering judgment, but they give the platform concrete facts to evaluate later.

Required evidence

Every production artifact should produce four pieces of evidence:

An SBOM in CycloneDX or SPDX format.
A signature, preferably using Sigstore Cosign where that fits the environment.
Provenance linking commit SHA, pipeline ID, runner identity, and build environment.
An immutable artifact reference by digest.
A signature that deployment can verify before promotion.

The policy-injected pipeline checks for those files on protected refs:

platform:provenance-required:
  stage: package
  rules:
    - if: '$CI_COMMIT_REF_PROTECTED == "true"'
  script:
    - test -n "${CI_COMMIT_SHA}"
    - test -n "${CI_PIPELINE_ID}"
    - test -n "${CI_RUNNER_ID}"
    - test -f sbom.cdx.json
    - test -f provenance.intoto.jsonl
    - test -f signature.bundle

This is not a complete SLSA implementation by itself. It is a platform contract that makes provenance and signing normal pipeline outputs.

Dependency control

Release builds should avoid unverified dependency resolution at runtime. Pin dependencies, proxy or mirror them, and scan them continuously. Manage and scan base images. Do not promote mutable image tags such as latest.

Assume dependencies and package scripts are hostile until the platform has evidence otherwise. Production builds should resolve packages and images from approved, logged, scanned, and cached sources instead of downloading directly from the public internet.

GitLab can help centralize package ingress:

Dependency Proxy for container images
Virtual Registry for proxying and caching upstream registries
Package Registry for internal packages
container registries that enforce image scanning and retention policy

This does not remove the need to patch dependencies. It gives the platform a place to observe, cache, scan, and restrict what production builds consume.

The OPA CI policy reinforces that by rejecting mutable image references:

deny contains msg if {
  job := input.ci.jobs[_]
  image := object.get(job, "image", "")
  endswith(image, ":latest")
  msg := sprintf("job %q uses mutable latest image tag", [job.name])
}

That rule is intentionally narrow and testable. It catches a common bypass without claiming to solve the entire dependency problem.

Where GitLab supports it, CI job token permissions should also be narrowed. A build job that only needs to fetch dependencies should not inherit broad access to packages, releases, deployments, or unrelated projects.

Why SRE belongs here

Supply chain controls only work if the delivery platform is reliable enough for teams to use. If runners are saturated, scans fail intermittently, signing services are flaky, or artifact uploads time out, teams experience security as random breakage. Over time, they will look for exceptions.

SRE practices keep the secure path usable. SLOs define which user journeys matter. Dashboards show whether failures are isolated to one runner tier or part of a broader platform incident. Runbooks make recovery repeatable. Reviews make sure exceptions, access, and policy drift do not quietly become the new baseline.

SLOs

The platform defines SLO seeds for the user journeys that matter most:

slos:
  - name: gitlab-core-workflow-availability
    target: 99.9
    window: 30d
    indicator: successful_git_operations / total_git_operations
  - name: runner-standard-queue-latency
    target: 95
    objective: p95 <= 120s
    window: 7d
    indicator: gitlab_runner_job_queue_duration_seconds
  - name: platform-owned-pipeline-success-rate
    target: 99.0
    window: 30d

These SLOs avoid a common trap: measuring only whether GitLab is up. For developers, the platform is healthy when they can push, open merge requests, start pipelines quickly, run protected release flows, and get Terraform plans through the approved path.

Operational dashboards should also track:

runner queue time
job success rate
runner saturation
cost per pipeline minute
failed pod scheduling
stale runner registrations
scan failure rate

Runbooks and reviews

The operating model requires:

on-call ownership for the platform
incident commander and communications roles for major incidents
postmortems for severity 1 and 2 incidents
monthly policy exception review
quarterly access review for Owners, Maintainers, and custom roles
dormant user review
stale token review
unprotected default branch review
runner-tier health reviews
recurring dependency and base-image review

Those practices are not paperwork around the platform. They are part of the platform. A security policy that nobody reviews becomes stale. A runner tier without a runbook becomes a recurring incident. A signing requirement without dashboards becomes a mystery when releases start failing.

Closing the loop

At this point the platform has a shape:

GitLab enforces delivery workflow.
HCP Terraform governs structural changes.
OPA and GitLab security policies make controls testable.
EKS isolates runner execution.
Supply chain evidence makes artifacts traceable.
SRE practices keep the platform reliable enough to depend on.

This is still a baseline, not a finished enterprise deployment. That is the right place to start. Get the control boundaries, reviewable code, testable policy, isolated execution, and operational measures right before adding organization-specific integrations.

The point of the platform is not to collect tools. It is to make the secure path clear and usable enough that teams choose it for normal work.