# mTLS and Service Identity Deep Dive

> **Intro:** mTLS is not “just turn on encryption between services”. Done well, it becomes the identity plane for service-to-service trust. Done badly, it becomes expensive encryption with weak authorization semantics, unclear rotation ownership, and broad trust domains.
>
> **What this page includes**
>
> * how service identity differs from shared secret trust
> * where mTLS fits and where it does not
> * SPIFFE/SPIRE, mesh, and gateway patterns
> * certificate ownership, rotation, and trust-domain boundaries
> * review questions for microservice, Kubernetes, and platform teams

## What service identity is

Service identity answers **which workload is talking** to another workload, independently of the IP or node where it currently runs.

Good service identity should be:

* strongly bound to workload identity or workload attestation;
* short-lived;
* automatically renewed;
* scoped to a trust domain;
* usable for both authentication and policy decisions.

## Where mTLS helps

| Goal                           | Why mTLS helps                                               |
| ------------------------------ | ------------------------------------------------------------ |
| **Confidentiality in transit** | encrypts traffic between services                            |
| **Mutual authentication**      | both client and server present validated identity            |
| **Policy enforcement**         | destination can require specific principals or trust domains |
| **Replay reduction**           | better than copied bearer tokens on internal links           |

## Where mTLS is not enough

mTLS alone does **not** answer:

* whether the authenticated caller is allowed to perform a specific business action;
* which tenant the caller is acting for;
* whether a request should be rate-limited, audited, or masked differently.

That means mTLS should usually pair with one or more of:

* service authorization policy;
* tenant-aware claims or signed identity tokens;
* workload or request context propagated to the application layer.

## Trust model choices

### 1) shared-secret trust

Fast to start, weak to scale.

### 2) internal PKI with workload certificates

Good baseline for platform-controlled environments.

### 3) SPIFFE / SPIRE style workload identity

Best when the organization wants explicit workload attestation, federation, and strong identity semantics across heterogeneous environments.

## Common deployment patterns

### Pattern A — mesh-managed mTLS

* service mesh sidecars or ambient components handle identity and cert distribution;
* platform enforces policy centrally;
* app team gets encryption and identity with little code.

**Trade-off:** powerful, but can hide the trust model from engineers if documentation is weak.

### Pattern B — library / gateway mTLS

* client or gateway explicitly manages certs;
* often used at ingress/egress or between systems outside the mesh.

**Trade-off:** clearer at edges, more operational burden inside the app estate.

### Pattern C — SPIFFE/SPIRE workload identity

* workloads receive SPIFFE IDs and X.509 SVIDs or JWT-SVIDs based on attestation;
* identity can feed mesh, gateway, or application policy layers.

**Trade-off:** strong identity semantics and federation options, but more platform design work.

## Design questions that matter most

| Question                                      | Why it matters                                      |
| --------------------------------------------- | --------------------------------------------------- |
| What is the **trust domain**?                 | prevents accidental cross-environment trust         |
| Who issues workload certs?                    | determines compromise and rotation blast radius     |
| How short-lived are certs?                    | limits stolen-cert usefulness                       |
| Where do private keys live?                   | affects node compromise and pod escape consequences |
| Who rotates issuer and trust anchor material? | often the real production failure point             |

## Certificate ownership model

### Workload certificates

* typically issued automatically;
* short-lived;
* owned operationally by platform engineering, not by each application team.

### Issuer / intermediate certificates

* higher-impact material;
* should have a tighter admin set and stronger change control;
* often rotated via cert-manager, Vault PKI, or external CA workflows.

### Root / trust anchor

* highest-sensitivity material;
* ideally managed offline or in a tightly controlled CA workflow;
* rotation should be planned well before expiry.

## Authorization after authentication

The minimum useful rule after mTLS is:

> authenticated caller X may invoke workload Y on operation Z only in environment E under trust domain T.

Without that, many teams stop at “encrypted traffic exists” and miss the fact that over-trusting internal callers is still a major lateral movement problem.

## Failure modes to look for

1. **one shared issuer for too many environments**
2. **long-lived workload certs**
3. **broad trust domain with no environment separation**
4. **permissive mode left on indefinitely**
5. **mTLS identity established, but resolver / service authorization missing**
6. **issuer rotation documented poorly or not rehearsed**
7. **mesh hidden from app teams, so debugging bypasses security controls**

## Practical review prompts

* what principal does service A present to service B?
* how is that identity issued and rotated?
* what happens if a pod is copied or rescheduled?
* can a compromised workload from dev talk to prod?
* is there a clear distinction between **transport trust** and **application authorization**?

## Read next

* [🔐 Internal PKI for Microservices — mTLS, Certificate Automation, and Trust Distribution](/cloud-kubernetes-and-infrastructure-security/index/internal-pki-for-microservices-mtls-and-certificate-automation.md)
* [☸️ Istio / Linkerd mTLS Operations and Certificate Rotation](/cloud-kubernetes-and-infrastructure-security/index-1/istio-linkerd-mtls-operations-and-certificate-rotation.md)
* [🔗 Service-to-Service Auth, Webhooks, and Event-Driven Security](/architecture-api-crypto-and-identity/index-1/service-to-service-auth-webhooks-and-event-driven-security.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.product-security.expert/architecture-api-crypto-and-identity/index-2/mtls-and-service-identity-deep-dive.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
