8. Supply Chain and SRE
Secure delivery does not end when a pipeline turns green. A production platform also needs evidence that explains what was built, where it ran, which commit produced it, and whether the platform itself is healthy enough to trust.
This page connects two concerns that are often separated: software supply chain security and SRE operations.
Supply chain security in plain language
Section titled “Supply chain security in plain language”Software supply chain security is about the integrity of the path from source to production. Source code is one input, but it is not the only one. Dependencies, base images, build tools, CI runners, package registries, artifact stores, signing keys, deployment credentials, and release approvals all influence what eventually runs.
A supply chain control should help answer one of these questions:
- Where did this artifact come from?
- Which source commit produced it?
- Which pipeline built it?
- Which runner executed the build?
- Which dependencies were included?
- Was the artifact signed?
- Can deployment verify the artifact by digest?
- Was the platform healthy when the build ran?
Finding vulnerabilities is only part of the work. The larger goal is traceability: being able to explain how a change became a production artifact.
Prevention and evidence
Section titled “Prevention and evidence”Modern supply chain programs need both prevention and evidence.
Prevention reduces the chance of compromise. Examples include protected branches, required review, pinned dependencies, isolated runners, restricted credentials, mandatory scans, and policy checks.
Evidence helps during deployment, audit, and incident response. Examples include SBOMs, signatures, provenance, immutable artifact references, pipeline logs, runner identifiers, and approval records.
SLSA, Sigstore, SBOM formats, provenance, and immutable artifact references are useful because they turn delivery into something that can be inspected. They do not remove the need for engineering judgment, but they give the platform concrete facts to evaluate later.
Required evidence
Section titled “Required evidence”Every production artifact should produce four pieces of evidence:
- An SBOM in CycloneDX or SPDX format.
- A signature, preferably using Sigstore Cosign where that fits the environment.
- Provenance linking commit SHA, pipeline ID, runner identity, and build environment.
- An immutable artifact reference by digest.
- A signature that deployment can verify before promotion.
The policy-injected pipeline checks for those files on protected refs:
platform:provenance-required: stage: package rules: - if: '$CI_COMMIT_REF_PROTECTED == "true"' script: - test -n "${CI_COMMIT_SHA}" - test -n "${CI_PIPELINE_ID}" - test -n "${CI_RUNNER_ID}" - test -f sbom.cdx.json - test -f provenance.intoto.jsonl - test -f signature.bundleThis is not a complete SLSA implementation by itself. It is a platform contract that makes provenance and signing normal pipeline outputs.
Dependency control
Section titled “Dependency control”Release builds should avoid unverified dependency resolution at runtime. Pin dependencies, proxy or mirror them, and scan them continuously. Manage and scan base images. Do not promote mutable image tags such as latest.
Assume dependencies and package scripts are hostile until the platform has evidence otherwise. Production builds should resolve packages and images from approved, logged, scanned, and cached sources instead of downloading directly from the public internet.
GitLab can help centralize package ingress:
- Dependency Proxy for container images
- Virtual Registry for proxying and caching upstream registries
- Package Registry for internal packages
- container registries that enforce image scanning and retention policy
This does not remove the need to patch dependencies. It gives the platform a place to observe, cache, scan, and restrict what production builds consume.
The OPA CI policy reinforces that by rejecting mutable image references:
deny contains msg if { job := input.ci.jobs[_] image := object.get(job, "image", "") endswith(image, ":latest") msg := sprintf("job %q uses mutable latest image tag", [job.name])}That rule is intentionally narrow and testable. It catches a common bypass without claiming to solve the entire dependency problem.
Where GitLab supports it, CI job token permissions should also be narrowed. A build job that only needs to fetch dependencies should not inherit broad access to packages, releases, deployments, or unrelated projects.
Why SRE belongs here
Section titled “Why SRE belongs here”Supply chain controls only work if the delivery platform is reliable enough for teams to use. If runners are saturated, scans fail intermittently, signing services are flaky, or artifact uploads time out, teams experience security as random breakage. Over time, they will look for exceptions.
SRE practices keep the secure path usable. SLOs define which user journeys matter. Dashboards show whether failures are isolated to one runner tier or part of a broader platform incident. Runbooks make recovery repeatable. Reviews make sure exceptions, access, and policy drift do not quietly become the new baseline.
The platform defines SLO seeds for the user journeys that matter most:
slos: - name: gitlab-core-workflow-availability target: 99.9 window: 30d indicator: successful_git_operations / total_git_operations - name: runner-standard-queue-latency target: 95 objective: p95 <= 120s window: 7d indicator: gitlab_runner_job_queue_duration_seconds - name: platform-owned-pipeline-success-rate target: 99.0 window: 30dThese SLOs avoid a common trap: measuring only whether GitLab is up. For developers, the platform is healthy when they can push, open merge requests, start pipelines quickly, run protected release flows, and get Terraform plans through the approved path.
Operational dashboards should also track:
- runner queue time
- job success rate
- runner saturation
- cost per pipeline minute
- failed pod scheduling
- stale runner registrations
- scan failure rate
Runbooks and reviews
Section titled “Runbooks and reviews”The operating model requires:
- on-call ownership for the platform
- incident commander and communications roles for major incidents
- postmortems for severity 1 and 2 incidents
- monthly policy exception review
- quarterly access review for Owners, Maintainers, and custom roles
- dormant user review
- stale token review
- unprotected default branch review
- runner-tier health reviews
- recurring dependency and base-image review
Those practices are not paperwork around the platform. They are part of the platform. A security policy that nobody reviews becomes stale. A runner tier without a runbook becomes a recurring incident. A signing requirement without dashboards becomes a mystery when releases start failing.
Closing the loop
Section titled “Closing the loop”At this point the platform has a shape:
- GitLab enforces delivery workflow.
- HCP Terraform governs structural changes.
- OPA and GitLab security policies make controls testable.
- EKS isolates runner execution.
- Supply chain evidence makes artifacts traceable.
- SRE practices keep the platform reliable enough to depend on.
This is still a baseline, not a finished enterprise deployment. That is the right place to start. Get the control boundaries, reviewable code, testable policy, isolated execution, and operational measures right before adding organization-specific integrations.
The point of the platform is not to collect tools. It is to make the secure path clear and usable enough that teams choose it for normal work.