Why CI runners are the soft target

In the spring of 2026 we watched Trivy, axios, Bitwarden CLI, and a 317-package Mini Shai-Hulud npm wave become trojanized inside the same workflow surface, the CI runner, and exfiltrate secrets out of jobs that, on paper, completed successfully. None of these were exotic intrusions. The pattern is identical and it is not a coincidence.

A modern CI runner is the most credentialed, least-firewalled, most-volatile machine in your infrastructure. It has the credentials to deploy production by design. It executes whatever a package.json, a Gemfile.lock or a .gitlab-ci.yml include: resolves to: the dependency graph is the attack surface. Its default egress posture is “allow everything outbound” because builds need to download things. And your security team rarely instruments it the way they instrument a workstation.

Stopping these attacks is, in the end, a question of default-deny egress with process attribution. Every one of the recent compromises shares the same shape: a trusted binary inside a runner job tries to reach an attacker domain or hardcoded IP. The hard part is not the policy. The hard part is enforcing it on the right cgroup, before any user code runs, while still telling the operator which CI job and which process caused the violation.

This post walks through how Leitwacht actually does that.

The actual problem: DNS ↔ connect attribution

Your CI job runs npm install. npm calls getaddrinfo("registry.npmjs.org") which returns an address (say 104.16.27.34, for illustration). npm then calls connect() to that IP. By the time the kernel sees the connect syscall, the original DNS name is gone: the kernel sees a 4-tuple, not an FQDN.

If you want to enforce “only registry.npmjs.org is allowed”, you have to either:

  1. Inspect the SNI on the outbound TLS handshake (L7, expensive, blind to non-TLS).
  2. Hold the resolver and remember the answer, then match the subsequent connect() against that memory.

Leitwacht does (2). It’s cheaper, it works for any L4 protocol, and it puts the policy decision at the same layer where the kernel will actually drop the packet.

Two things have to be true for that to be sound:

  • The runner cannot bypass our resolver. Otherwise it talks to 8.8.8.8 directly and we see no DNS at all.
  • There must be no race window between the cgroup being created and our enforcement being attached. Otherwise the first connect() after pod start escapes.

Everything below is a consequence of those two requirements.

The data plane: cgroup_skb/egress

Egress enforcement runs in a cgroup_skb/egress BPF program attached to the Runner pod’s parent cgroup v2, at the pod level, so cgroup BPF inheritance covers init, helpers, build container, and any sub-cgroup created later. The snippet below is a simplified illustration of the verdict path: the hard blocks, the allow-map lookup, and default-deny. Process attribution is not read here; it comes from per-socket sk_storage populated by separate LSM hooks (more on that below).

/* Simplified illustration: the shipped program omits some hard-block
   and audit-mode branches, and reads packet fields via
   bpf_skb_load_bytes rather than direct access. */
struct allow_key {
    __u32 ip;
    __u16 port;
} __attribute__((packed));

struct {
    __uint(type, BPF_MAP_TYPE_LRU_HASH);
    __type(key,   struct allow_key);
    __type(value, __u8);          // 1 = allow
    __uint(max_entries, 65536);
} allowed_ips SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_LPM_TRIE);
    __type(key,   struct lpm_key); // prefix + IP
    __type(value, __u8);
    __uint(max_entries, 64);
    __uint(map_flags, BPF_F_NO_PREALLOC);
} trusted_nets SEC(".maps");

SEC("cgroup_skb/egress")
int egress_filter(struct __sk_buff *skb) {
    struct iphdr iph;
    if (bpf_skb_load_bytes(skb, 0, &iph, sizeof(iph)) < 0)
        return 1;

    __be16 dport = parse_dport(skb, &iph);

    /* Hard blocks first, before the allow map: cloud metadata, DoT. */
    if (iph.daddr == bpf_htonl(0xa9fea9fe))         /* 169.254.169.254 */
        return apply_drop(skb, iph.daddr, dport, ...);

    struct allow_key k = { .ip = iph.daddr, .port = dport };
    if (bpf_map_lookup_elem(&allowed_ips, &k))
        return 1;                   /* resolver-populated allow */

    struct lpm_key cidr = { .prefixlen = 32, .ip = iph.daddr };
    if (bpf_map_lookup_elem(&trusted_nets, &cidr))
        return 1;                   /* trusted CIDR: K8s API, etc. */

    /* apply_drop emits the violation (attribution from sk_storage) and
       returns the verdict; block mode drops, audit mode lets it pass. */
    return apply_drop(skb, iph.daddr, dport, ...);   /* default-deny */
}

A few things worth pointing out:

  • Cgroup scope is the pod parent. Attaching at the pod parent means the program runs on egress from any descendant cgroup, including sub-cgroups created later by the runtime when the build container starts. No race when the build container’s cgroup appears.
  • Map shape: LRU_HASH keyed on (ip, port) for the allowlist, plus a small LPM_TRIE for trusted CIDRs. The cgroup component is implicit: the program is per-cgroup-attached, so the lookup space is already scoped to that cgroup’s traffic. Multi-tenant isolation comes from the per-pod attach, not from a per-cgroup key.
  • The verdict program reads no current-task helpers; attribution comes from sk_storage. The cgroup_skb/egress program decides the verdict and counts bytes, but it does not call bpf_get_current_cgroup_id or bpf_get_current_comm. On a packet emitted in softirq context (TCP retransmits, keepalives, delayed ACKs) those helpers would name whichever task was preempted on that CPU, not the socket’s owner. Instead, separate lsm/socket_connect and lsm/socket_sendmsg hooks run in process context, where current is guaranteed to be the real owner, and write (cgroup_id, pid, comm, binary_path) into per-socket sk_storage. The egress program reads that storage off skb->sk. Sidestepping the current-task helpers is also what lets one program load across kernels from 5.10 onward, including the 6.1 kernel where those helpers are not exposed to cgroup_skb. Detailed write-up in cgroup_skb/egress: drop, count, attribute (forthcoming).
  • Drop semantics. cgroup_skb returns 0 to drop the skb. Userspace sees a protocol-synthesized error (EPERM for connect on TCP in modern kernels, sometimes ECONNREFUSED). Setting an explicit errno via bpf_set_retval is a 5.18+ helper we don’t currently rely on. Most real applications retry once, log, and move on, which is the failure mode you want for a credential stealer.
  • Default-deny via “no entry” not “explicit deny entry.” A managed cgroup with no allow-map hit drops. This keeps the hot-path lookup to one or two map calls and means a misconfigured allowlist fails closed.

The control plane: nftables redirect + DNS resolver + BPF map

The eBPF program above only fires after we have a destination IP. The decision of which IPs are allowed is made by the resolver. The agent installs a per-pod nftables redirect inside the pod’s network namespace, no cgroup matching needed because the rules are scoped to one pod’s netns by definition:

table inet leitwacht {
  chain output {
    type nat hook output priority -100;

    udp dport 53 redirect to :15353       # to local resolver in agent
    tcp dport 53 redirect to :15353
  }
}

That’s all the nftables does. DoT (port 853), cloud metadata (169.254.169.254), and IPv6 are blocked in the BPF program itself, not in nftables: the eBPF path runs after routing and has the same authoritative view of the destination, with the bonus that the policy is enforced regardless of any in-pod nftables manipulation. (The agent runs the resolver in the agent’s own netns and proxies queries back to the pod via socket activation per-cgroup. Pod traffic on port 53 lands at 127.0.0.1:15353, which the agent’s per-pod listener consumes.)

The resolver evaluates the FQDN against the policy bundle and on a positive answer it inserts an (answer_ip, port) allow entry into the per-pod allowed_ips map. The TTL of the entry tracks the DNS answer’s TTL plus a small grace window.

This is the moment where Leitwacht’s DNS↔connect attribution actually happens. By the time the runner’s npm calls connect(), the LRU hash already contains the right entry, populated by the resolver milliseconds earlier, and the verdict is a single map lookup.

The non-trivial cases:

  • Statically linked resolvers. Go binaries built with netgo, or curl --resolve, skip /etc/resolv.conf and call the upstream resolver directly. The udp/tcp dport 53 redirect catches them anyway because we match on port, not on /etc/nsswitch.conf lookups.
  • Hardcoded IPs. No DNS lookup means no map entry; default-deny drops the packet. We surface a violation marked direct-IP egress (no DNS answer recorded) so the operator sees it.
  • DoH on port 443. Indistinguishable from regular HTTPS without SNI inspection. Mitigated at the policy level by not allowlisting known-DoH endpoints (cloudflare-dns.com, dns.google, etc.): the resolver never inserts an allow entry for them, the cgroup_skb program drops, and the operator sees the attempt.

The race: an init-container barrier plus a containerd-events watcher

The hard problem is the attach race. Between the moment a Kubernetes pod’s network namespace is created (clone(CLONE_NEWNET) inside the kubelet’s CRI runtime) and the moment our cgroup_skb program is attached to its cgroup, any process inside the pod that calls connect() escapes.

The naïve fix, watching Kubernetes events and reacting to Pod.status.phase == Running, loses the race by tens of milliseconds, which is plenty for a malicious postinstall to land its first packet.

The textbook synchronous answer is an NRI plugin in containerd / CRI-O. We don’t use it. NRI plugins are node-scoped: they intercept every pod start on the node, and a buggy plugin stalls the runtime’s whole pod-creation loop. The blast radius is the entire node, the install touches /etc/nri/conf.d/, and you can’t ship that to managed Kubernetes (GKE / EKS / AKS) without escaping the managed boundary.

What we do instead is split the problem in two:

1. The agent subscribes to containerd’s events socket. When the K8s CRI runtime starts a pod’s pause container, containerd emits a /tasks/start event on /run/containerd/containerd.sock before any user container in the pod runs. The agent receives it, identifies pod sandboxes by the io.cri-containerd.kind="sandbox" container label, reads the pod-parent cgroup path, and attaches cgroup_skb/egress plus the nftables redirect plus the seeded policy maps. Containerd doesn’t block on us: it’s an async event stream. If our process crashes, containerd is unaffected.

2. A zero-capability init container in the runner pod is the synchronization barrier. The init container drops ALL capabilities, mounts no volumes, and is a tiny DNS client that polls the agent. It exits 0 only after the agent ACKs that BPF has been attached to its pod’s cgroup. If the agent is down or hasn’t attached yet, the poll fails, the init container exits non-zero, the kubelet refuses to start the build container, fail-closed by the kubelet’s own ordering guarantee.

The pod-spec patch to enable this is a single block in the GitLab Runner config:

# config.toml: GitLab Runner Kubernetes executor
environment = ["FF_USE_ADVANCED_POD_SPEC_CONFIGURATION=true"]

[runners.kubernetes]
  [[runners.kubernetes.pod_spec]]
    name = "leitwacht"
    patch = '''
      spec:
        initContainers:
          - name: leitwacht-attach
            image: registry.gitlab.com/leitwacht/leitwacht-initc:v0.5.0
            securityContext:
              runAsNonRoot: true
              capabilities:
                drop: ["ALL"]
    '''
    patch_type = "strategic"

That’s the whole install: pod_spec injection plus the FF_USE_ADVANCED_POD_SPEC_CONFIGURATION=true feature flag. No node access, no NRI registration, no runtime modification, no privileged init container. Failure is per-pod (init poll times out, build container never starts), never per-node. The added pod-startup latency is the cost of one init container that exits as soon as the agent ACKs the attach, dominated on a cold node by the init image pull rather than by the attach itself.

Detailed walk-through in Closing the attach race without NRI (forthcoming).

What the operator sees

A blocked egress event from the data plane reaches the control plane through a BPF ringbuf consumer in Go. After enrichment with cgroup → pod → job correlation (covered in Attributing a packet, link by link, forthcoming), it lands in the operator UI as:

{
  "ts": "2026-04-29T09:02:11.412Z",
  "verdict": "block",
  "reason": "domain_not_in_allowlist",
  "policy": {
    "scope": "project",
    "profile": "build/node-22",
    "rule_id": "default-deny"
  },
  "process": {
    "comm": "node",
    "binary": "/usr/local/bin/node",
    "pid": 7821,
    "ancestry": ["sh", "npm", "node"]
  },
  "destination": {
    "host": "sfrclak.com",
    "ip": "142.11.206.73",
    "port": 8000,
    "proto": "tcp"
  },
  "gitlab": {
    "project": "fintech-nl/payments-api",
    "pipeline_id": 9124551,
    "job_id": 388420,
    "job_name": "install",
    "ref": "main"
  }
}

That’s the complete chain, kernel skb to a clickable GitLab job URL, in a single record. The same record is what populates the violation table on the dashboard, the per-pipeline egress log, and the audit log.

What’s still hard

  • Encrypted Client Hello (ECH). Once ECH is universal, even an SNI-aware tool can’t see the destination FQDN on a TLS handshake. The same blindness applies to QUIC. Our cgroup_skb path is L4 so we’re insulated from this: the cost is no L7 introspection at all, ever.
  • POST-to-allowed-domain exfil. If gitlab.com is on your allowlist (it probably is) and an attacker pushes data into a public Gist, Leitwacht does not stop the upload. We don’t claim to. The credwatch LSM narrows the gap by detecting the credential read that would have produced the data in the first place, but the exfil channel itself is L7 and out of scope.
  • Direct-IP exfil to a popular hyperscaler IP. If an attacker exfils to a Cloudflare Worker IP (e.g. one in Cloudflare’s range) and your policy already permits *.workers.dev, we’ll allow it. There’s no signal in the 4-tuple.
  • In-pod tools that hold CAP_NET_ADMIN (BuildKit, kaniko, Docker-in-Docker). These can manipulate nftables inside their own netns. We address this with a different deployment shape, covered in the next section.

BuildKit, kaniko, and the CAP_NET_ADMIN-in-pod problem

Container-image builds running inside a CI job are a different threat shape than npm install. BuildKit needs CAP_NET_ADMIN to manipulate the build sandbox’s network. Kaniko does its own iptables work. Docker-in-Docker assumes free reign over networking inside the runner pod. Any of these inside an unconstrained CI pod could in theory nft flush table inet leitwacht and bypass the DNS redirect.

Important to be clear: bypassing the redirect doesn’t bypass enforcement: cgroup_skb/egress runs after routing, in the kernel’s main packet path, with no userspace dependency. Flushing the in-pod nftables only blinds the DNS allow-list population path; the BPF default-deny on every packet still fires. The blinded attacker can connect to IPs they already know (from a previous resolve, or from a baked-in IOC list), but only to IPs already on the allow map, and every new destination still drops.

Where we want stronger guarantees, we don’t run BuildKit or Docker-in-Docker inside a CI pod at all. The deployment pattern:

  • BuildKit runs as a long-lived host daemon with its own cgroup, outside the CI runner pod’s namespace. The agent attaches its cgroup_skb program to the BuildKit cgroup on startup, so BuildKit’s egress is subject to the same positive allowlist as the job container’s.
  • One buildkitd serves many concurrent CI jobs. Earlier prototypes serialized builds per host; the current architecture runs a persistent shared buildkitd with several Solve sessions in flight at once, and attributes every packet down to the specific Dockerfile instruction that produced it ([build 5/12] RUN npm ci, with session ref and vertex digest). The agent subscribes to buildkitd’s Control.Status gRPC stream, maintains a per-session vertex registry, and joins the runc-created RUN cgroup to its owning vertex via /proc/<pid>/cmdline plus environ tiebreakers. Ambiguous joins (identical RUN strings across concurrent sessions) surface as ambiguous_attribution rather than guess.
  • The eBPF allow map and ringbuf are shared between the runner pod and BuildKit’s cgroup, not duplicated. A BPF_MAP_TYPE_LRU_HASH with a single set of (ip, port) entries means there’s no race where BuildKit allowed something the runner didn’t.
  • CI identity reaches the build graph via build-args. The CI runner injects LEITWACHT_CI_PROJECT_PATH, LEITWACHT_CI_JOB_ID, etc. as --build-args; BuildKit propagates them into SolveRequest.FrontendAttrs, the agent reads them off the Status stream, and each per-vertex egress event carries its originating GitLab job. Operators can flip requireCIMetadata: true to cgroup.kill any Solve that omits the keys.

Under this model, the BuildKit / kaniko CAP_NET_ADMIN is contained inside its own cgroup-attached BPF, and the bypass doesn’t reach the runner pod’s policy. The cost is operational complexity (a daemon to run, a VM to operate); the benefit is that container builds don’t punch a hole in egress enforcement.

What this doesn’t defend against is typosquatted or compromised packages served from an allowed registry: a malicious module on proxy.golang.org is on-policy at the network layer, regardless of how the builder is sandboxed. That’s a content-layer problem, mitigated by lockfiles, dependency scanning, and SBOM verification, not by us.

These are the limits we’re honest about. Everything else, the actual choke point of “credential-stealer phones home from a CI runner over plain egress”, is closed by the path described above.

Deeper dives behind each piece of this picture are forthcoming: Closing the attach race without NRI will cover the containerd-events + init-container barrier and why we rejected NRI; Attributing a packet, link by link will walk the full forensic chain from kernel verdict to clickable URL; cgroup_skb/egress: drop, count, attribute will be the decision record for the BPF primitives.