# Threat Modeling Process — Kubernetes Example

> **Intro:** Kubernetes threat modeling goes wrong when teams stop at the Pod or node level. A realistic model has to slice the stack from ingress and identities all the way to CI/CD, registry trust, cluster control plane, and cloud privileges.
>
> **Why this page exists**
>
> * the OTUS Cloud DevSecOps materials use a layered attack-surface idea for Kubernetes, but the KB did not yet have a dedicated worked example;
> * many teams know STRIDE in theory but struggle to apply it to a live cluster;
> * the goal here is to show a **repeatable, product-focused process** rather than a one-off whiteboard ritual.

## When to run this model

Use this page when at least one of the following is true:

* a new cluster or namespace boundary is being introduced;
* a service is moving from VM or serverless to Kubernetes;
* a new ingress, API gateway, service mesh, or admission policy is being introduced;
* a workload receives new secrets, broader service account access, or cloud IAM trust;
* CI/CD now deploys directly to the cluster;
* the product is multi-tenant or contains admin workflows.

## Example system in scope

This example models a typical cloud-native product slice:

* public users access a web frontend;
* the frontend calls a REST API in Kubernetes;
* the API talks to Redis and PostgreSQL;
* background workers consume jobs from a queue;
* images are built in CI and pulled from a private registry;
* secrets come from a cluster secret store;
* the cluster runs in a cloud account and workloads can reach some cloud APIs.

## Step 1 — Define the review objective

Keep the objective concrete.

**Bad objective**

> threat model the cluster

**Good objective**

> threat model the payment-api namespace before production release, with focus on tenant isolation, service account use, image trust, ingress exposure, and lateral movement after workload compromise.

## Step 2 — List the critical assets

| Asset                                        | Why it matters                         | Typical owner                             |
| -------------------------------------------- | -------------------------------------- | ----------------------------------------- |
| customer data in PostgreSQL                  | confidentiality and integrity risk     | application team + database/platform team |
| workload identities / service account tokens | can enable cluster or cloud escalation | platform team                             |
| container images and tags                    | supply chain trust and rollback risk   | application team + platform team          |
| CI deploy credentials / GitOps trust         | release-path abuse and integrity risk  | DevSecOps / platform team                 |
| ingress / API endpoint                       | initial access and abuse surface       | app team + platform team                  |
| cluster audit logs and runtime telemetry     | detection and forensics                | security / platform team                  |

## Step 3 — Draw the layers before drawing threats

A useful Kubernetes model should at minimum look through these layers:

1. edge and ingress
2. application/API service
3. service-to-service calls
4. service account and cluster identity
5. secrets and configuration
6. data stores and queues
7. container image and runtime
8. Kubernetes control plane and node boundary
9. cloud IAM and metadata access
10. CI/CD and registry trust
11. logs, detections, and response path

## Step 4 — Identify trust boundaries

This is the point most teams skip.

**Trust boundaries in this example**

* internet → ingress
* ingress namespace → payment-api namespace
* payment-api → internal services
* namespace workload identity → Kubernetes API
* workload identity → cloud APIs
* CI pipeline → registry
* registry / GitOps → cluster deploy path
* app container → node / runtime / kernel
* cluster → external logging / SIEM destination

If a line crosses one of these boundaries, ask what proves the action is authorized and how it is logged.

## Step 5 — Walk attack paths by layer

### Layer 1: ingress and public exposure

Ask:

* can unauthenticated endpoints leak metadata, debug info, or object identifiers?
* can the ingress route around central auth or rate limiting?
* can path-based routing expose admin or internal paths unexpectedly?
* is TLS terminated in the right place and are headers trusted correctly?

**Typical findings**

* admin path reachable from public ingress;
* X-Forwarded-\* trust configured too broadly;
* missing request size or rate controls;
* DAST only covers anonymous routes, not authenticated flows.

### Layer 2: application and object access

Ask:

* where is tenant isolation enforced: gateway, app code, or downstream service?
* can object IDs be enumerated?
* does authorization happen once at the edge and then get assumed everywhere else?
* do batch/export/reporting routes bypass standard access checks?

**Typical findings**

* route auth present but object-level auth weak;
* internal service trusts caller headers instead of verified identity;
* async worker can read broader data than the API itself.

### Layer 3: service-to-service trust

Ask:

* is service identity explicit or implicit?
* are internal calls authenticated and authorized, or only “inside the cluster therefore trusted”?
* can one compromised workload call every internal service?

**Typical findings**

* flat east-west trust model;
* no namespace or network isolation;
* broad egress allows callbacks, exfiltration, or metadata access.

### Layer 4: Kubernetes identity and service accounts

Ask:

* does this Pod actually need a mounted service account token?
* what RBAC verbs and resources are granted?
* can compromise of this Pod become secret read, exec, log read, or new workload creation?

**Typical findings**

* automounted service account token not needed;
* Role/ClusterRole includes `get/list/watch secrets` or `pods/exec` unnecessarily;
* one namespace compromise becomes cluster reconnaissance.

### Layer 5: secrets and configuration

Ask:

* where do secrets originate?
* are they static or short-lived?
* can secrets be read from environment, mounted files, logs, crash dumps, or debug endpoints?
* can developers or support staff read them during incident response?

**Typical findings**

* long-lived cloud credentials mounted into app containers;
* secrets stored in plain Kubernetes Secrets without adequate governance;
* debug mode or startup logs reveal secret material.

### Layer 6: data stores and queues

Ask:

* can app compromise become full database compromise?
* are app DB credentials overprivileged?
* can a worker or queue consumer replay or mass-read other tenants’ data?

**Typical findings**

* app user owns schema and can alter tables;
* queue consumer has too-broad topic access;
* cache is reachable from too many workloads.

### Layer 7: image and runtime

Ask:

* does the workload run as root?
* is the filesystem writable?
* are extra Linux capabilities present?
* is seccomp/AppArmor/SELinux in use?
* what happens if the container is compromised?

**Typical findings**

* image runs as UID 0;
* no seccomp or AppArmor profile;
* writable root filesystem used even when not required;
* admission policies do not block privileged workloads.

### Layer 8: control plane and nodes

Ask:

* can node-level access or hostPath exposure bypass workload boundaries?
* are kubelet or node management interfaces exposed?
* does any workload get hostPID, hostNetwork, hostIPC, privileged, or hostPath mounts?

**Typical findings**

* operational debugging uses unsafe Pod specs;
* node compromise gives access to many namespaces;
* cluster audit logging is partial or disabled.

### Layer 9: cloud trust and metadata access

Ask:

* can workloads reach instance metadata or equivalent token services?
* are workload-to-cloud permissions minimal?
* can the same compromise path hit KMS, object storage, queues, or secrets manager?

**Typical findings**

* network egress allows metadata service;
* workload role includes broad object storage or decrypt permissions;
* app identity and CI identity are not separated cleanly.

### Layer 10: CI/CD and registry trust

Ask:

* who can push images or mutate tags?
* are deploys pinned by digest or floating tag?
* are there approval and evidence controls before production?
* can a runner compromise become image poisoning?

**Typical findings**

* mutable tags for production deploys;
* unsigned images admitted to cluster;
* pipeline and runtime trust share too much authority.

### Layer 11: logging and detection

Ask:

* what logs would prove or disprove workload abuse?
* are Kubernetes audit logs enabled and centralized?
* do we alert on `exec`, secret reads, RBAC changes, suspicious image changes, or unusual cloud API calls from workload identities?

**Typical findings**

* telemetry exists but no owner or alert path;
* runtime detections do not distinguish test from prod;
* incident responders cannot map workload identity to cloud actions.

## Step 6 — Use a structured method, but do not become trapped by it

### STRIDE mapping for this Kubernetes example

| STRIDE area            | Example in Kubernetes context                                          |
| ---------------------- | ---------------------------------------------------------------------- |
| Spoofing               | forged service identity, trusted headers, stolen service account token |
| Tampering              | image poisoning, mutable tag overwrite, manifest drift                 |
| Repudiation            | weak or missing audit logs for deploys, exec, secret reads             |
| Information disclosure | cross-tenant reads, secret leakage, broad logs, metadata access        |
| Denial of service      | no quotas/limits, queue abuse, expensive public endpoints              |
| Elevation of privilege | root container, broad RBAC, cloud role escalation                      |

Do not force every threat into a method matrix if it makes the session worse. The method is there to improve coverage, not to replace judgment.

## Step 7 — Convert the model into engineering outputs

A good threat model ends with **owned actions**.

### Example output set for this Kubernetes system

| Output type           | Example action                                                                                      |
| --------------------- | --------------------------------------------------------------------------------------------------- |
| design change         | enforce tenant checks in application service, not only ingress layer                                |
| platform guardrail    | disable service-account automount unless explicitly required                                        |
| policy gate           | block privileged Pods, hostPath mounts, and non-default seccomp via admission                       |
| release gate          | require digest-pinned deploys and signed image verification                                         |
| detection requirement | alert on `pods/exec`, secret reads, RBAC changes, and unusual cloud API actions from workload roles |
| residual risk record  | accepted short-term use of broad egress for migration, expires in 30 days                           |

## Worked mini-example: payment-api namespace

### Scenario

A new `payment-api` service is deployed behind ingress. It calls PostgreSQL and object storage. It runs in a namespace shared with several internal services. CI pushes `:latest` and Argo CD syncs automatically.

### Fast threat-model findings

1. `:latest` tag allows unsafe rollback/overwrite ambiguity.
2. service account token is automounted though the app does not call Kubernetes API.
3. namespace has no deny-by-default NetworkPolicy.
4. object storage access is broader than needed.
5. runtime detections do not cover container shell execution or unexpected outbound connections.

### Resulting actions

* pin production deploys by image digest;
* set `automountServiceAccountToken: false`;
* add namespace baseline NetworkPolicy;
* narrow cloud IAM to bucket prefix and action set actually needed;
* add runtime detection for shell spawn, package manager execution, curl/wget, and abnormal egress.

## Kubernetes-specific review checklist

Use this as the 10-minute closeout at the end of a modeling session.

* Does the workload need a service account token?
* Does the workload run as non-root?
* Is the root filesystem read-only where practical?
* Are seccomp/AppArmor/SELinux defaults enforced?
* Is east-west traffic actually segmented?
* Is metadata access blocked or intentionally controlled?
* Are cloud privileges scoped to the workload’s real need?
* Are production deploys pinned and signed?
* Are audit logs and runtime detections sufficient for incident response?
* Can a single namespace or runner compromise poison release or read other tenants’ data?

## Common failure modes

* the team models only ingress and API endpoints, ignoring CI/CD and cloud IAM;
* the team talks about “Kubernetes risk” generically but never names the service account, namespace, role, or deploy path;
* the session stops at “use RBAC” without checking actual verbs/resources;
* no one turns the findings into guardrails, detections, or due dates;
* the review is never repeated after architecture drift.

## Cross-links

* [Threat Modeling Methods and Workflows](/application-security-and-secure-sdlc/index/threat-modeling-methods-and-workflows.md)
* [Multi-Tenant and Microservice Threat Modeling](/application-security-and-secure-sdlc/index/multi-tenant-and-microservice-threat-modeling.md)
* [Kubernetes Hardening](/cloud-kubernetes-and-infrastructure-security/index-1/kubernetes-hardening.md)
* [Kubernetes API Access Hardening](/cloud-kubernetes-and-infrastructure-security/index-1/kubernetes-api-access-hardening.md)
* [Runtime Investigation Playbook for Kubernetes and Containers](/cloud-kubernetes-and-infrastructure-security/index-1/runtime-investigation-playbook.md)

***

*Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.product-security.expert/application-security-and-secure-sdlc/index/kubernetes-threat-modeling-process-example.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
