# Advanced Detection and Response for Senior Engineers

> **Intro:** Mature Product Security programs stop asking “do we have logs?” and start asking **which telemetry actually changes outcomes**. This page focuses on detection engineering decisions that senior engineers repeatedly make: source quality, correlation, response usefulness, and cost.

## What changes at senior level

Early-stage programs often optimize for **coverage language**:

* we log authentication events;
* we have WAF alerts;
* runtime tooling is installed;
* cloud detections are enabled.

Senior engineers optimize for **investigation value**:

* can we connect the event to an actor, workload, tenant, release, and control gap;
* can the on-call engineer decide in minutes whether the event matters;
* can we distinguish product abuse, operator error, misconfiguration drift, and active compromise;
* can we suppress predictable noise without deleting useful weak signals.

## The telemetry hierarchy that usually works

### 1. Identity and control-plane telemetry

This is often the highest-value layer because it answers **who asked for access and what the platform permitted**.

Examples:

* SSO and IdP sign-in events;
* federation and workload-identity exchanges;
* cloud control-plane actions;
* CI pipeline identity use;
* privilege elevation and break-glass use.

### 2. Application and API workflow telemetry

This is where business abuse and tenant-boundary events become visible.

Examples:

* object ownership checks failing;
* entitlement changes;
* promo / signup / reset / export flow anomalies;
* API rate limit overruns;
* unusual workflow transitions.

### 3. Runtime and data-plane telemetry

This is essential, but only after identity and workflow signal are reasonably mature.

Examples:

* suspicious process trees in containers;
* outbound network anomalies;
* file system writes in unexpected paths;
* package manager or shell execution in app workloads;
* container drift from signed or expected artifacts.

## What high-signal detections often look like

| Detection family        | Good signal usually includes                                                      | Common reason it fails                                  |
| ----------------------- | --------------------------------------------------------------------------------- | ------------------------------------------------------- |
| Federation abuse        | subject, audience, repo/project, branch/tag, cloud role, target account           | trust policy too broad or identity fields not preserved |
| Tenant-boundary abuse   | tenant ID, actor ID, object owner, route, method, auth scope                      | application logs omit authorization context             |
| CI compromise           | pipeline source, runner identity, changed include/component, secret exposure path | pipeline logs are verbose but not normalized            |
| Runtime anomaly         | workload identity, namespace, image digest, parent process, egress destination    | runtime tooling alerts without app context              |
| Business workflow abuse | step order, quota key, promo state, recovery action, device/IP                    | teams only log technical errors, not business states    |

## Correlation principles

### Correlate by release, not only by asset

Senior teams connect incidents to:

* release version;
* image digest;
* Git SHA;
* deployment window;
* feature flag state.

This makes it possible to answer: **did the event begin because of a code change, an environment change, or an attacker action?**

### Correlate by trust transition

Pay attention whenever trust changes:

* public request becomes authenticated session;
* CI identity becomes cloud role;
* user action becomes admin action;
* internal service call becomes cross-tenant data access;
* signed artifact becomes running workload.

Those transitions usually produce the highest-value detections.

## Response design rules

1. **Prefer alerts that suggest a first question, not only a category.**
   * Bad: “Possible privilege escalation.”
   * Better: “GitHub Actions OIDC token from non-release branch assumed production deployment role.”
2. **Include expected baseline context.** Every high-value alert should tell responders what normal looks like.
3. **Attach containment hints, not just evidence.** Example: revoke session, disable workload identity, freeze environment, rotate token, block deployment path.
4. **Treat business abuse as security, not only fraud or support noise.** The line between product abuse and account compromise is often thin.

## Decision matrix: where to spend the next detection dollar

| If you lack                        | Improve first                                |
| ---------------------------------- | -------------------------------------------- |
| actor certainty                    | identity and federation logs                 |
| tenant or workflow context         | application business-state logging           |
| evidence for blast-radius analysis | release and deployment metadata              |
| evidence for active execution      | runtime and egress telemetry                 |
| reliable triage speed              | normalization, routing, and alert narratives |

## Senior-engineer review checklist

* Do our top ten alerts preserve **actor, workload, tenant, and release** context?
* Can responders identify the **control gap** behind the event?
* Are we alerting on categories that nobody owns?
* Do we suppress noise by **understanding normal**, not by deleting whole alert classes?
* Can product teams see how their design choices improve or degrade detection quality?

## Suggested references

* NIST SSDF — <https://csrc.nist.gov/projects/ssdf>
* OWASP Logging Cheat Sheet — <https://cheatsheetseries.owasp.org/cheatsheets/Logging\\_Cheat\\_Sheet.html>
* DORA documentation quality and measurement guidance — <https://dora.dev/>

***

*Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.product-security.expert/strategy-governance-and-leadership/index-1/advanced-detection-and-response-for-senior-engineers.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
