# Containment and Eradication Automation Lab

> **Intro:** Detection is only half of the job. This lab teaches the next step: how to automate **safe containment**, preserve evidence, and then push the durable fix back into infrastructure-as-code, policies, or platform baselines.
>
> **What this page includes**
>
> * how to structure a containment-and-eradication lab;
> * safe automation patterns with SOAR and native cloud tools;
> * examples using AWS Systems Manager and Cortex XSOAR style playbooks;
> * how to convert a one-time incident into a codified control improvement.

## Learning goal

A good automation lab teaches four skills:

1. **know when to automate**;
2. **know what must stay manual**;
3. **preserve evidence before destroying context**;
4. **feed the durable fix back into code and policy**.

## Safe automation rules

Never automate destructive response before you answer:

* what evidence do we lose?
* can the action break production?
* who approves the containment?
* how do we restore safely?

## Good starter scenarios

| Scenario                                     | Why it is good for a lab                                      |
| -------------------------------------------- | ------------------------------------------------------------- |
| suspicious EC2 or VM egress                  | teaches reversible containment                                |
| compromised IAM principal or service account | teaches identity-focused isolation                            |
| public security group or NSG                 | teaches quick risk reduction and codified fix                 |
| container or pod compromise                  | teaches runtime evidence + kill/replace discipline            |
| leaked secret or token                       | teaches rotation, blast-radius review, and pipeline follow-up |

## AWS-native starter pattern

AWS Systems Manager already ships useful containment runbooks.

### Example: contain an EC2 instance

```bash
aws ssm start-automation-execution \
  --document-name AWSSupport-ContainEC2Instance \
  --parameters InstanceId=i-0123456789abcdef0
```

### Example: quarantine an EC2 instance

```bash
aws ssm start-automation-execution \
  --document-name AWS-QuarantineEC2Instance \
  --parameters InstanceId=i-0123456789abcdef0
```

### Example: contain an IAM principal

```bash
aws ssm start-automation-execution \
  --document-name AWSSupport-ContainIAMPrincipal \
  --parameters IAMResourceArn=arn:aws:iam::123456789012:user/suspicious-user
```

## Example custom SSM automation skeleton

```yaml
schemaVersion: '0.3'
description: Isolate instance and snapshot evidence metadata
assumeRole: '{{ AutomationAssumeRole }}'
parameters:
  AutomationAssumeRole:
    type: String
  InstanceId:
    type: String
mainSteps:
  - name: captureInstanceMetadata
    action: aws:executeAwsApi
    inputs:
      Service: ec2
      Api: DescribeInstances
      InstanceIds:
        - '{{ InstanceId }}'
  - name: quarantineInstance
    action: aws:executeAwsApi
    inputs:
      Service: ec2
      Api: ModifyInstanceAttribute
      InstanceId: '{{ InstanceId }}'
      Groups:
        - sg-containment
```

## Cortex XSOAR style lab pattern

A SOAR playbook is useful when your response needs:

* ticketing;
* analyst approval steps;
* enrichment;
* branching logic;
* human-in-the-loop escalation.

### Minimal playbook design idea

1. ingest incident;
2. enrich asset, identity, and tenant context;
3. ask: manual approval required?
4. perform reversible containment;
5. collect evidence references;
6. open remediation ticket;
7. trigger postmortem checklist.

## Example pseudo-playbook logic

```
If incident.type == suspicious-ec2:
  collect cloudtrail + vpcflow + guardduty context
  ask analyst for approval
  run AWSSupport-ContainEC2Instance
  create jira ticket for source-of-truth fix
  notify platform owner
```

## Postmortem-to-IaC feedback loop

This is the most important part of the lab.

After containment, force the learner to answer:

* what allowed the incident path?
* what guardrail should have stopped it earlier?
* what Terraform / Helm / policy / CI rule must change?
* what new detection should exist next time?

### Example: translate incident into code change

**Incident:** public admin port on a security group.

**Temporary action:** close the rule.

**Durable action:** update Terraform module defaults.

```hcl
variable "allowed_admin_cidrs" {
  type    = list(string)
  default = []
}

resource "aws_security_group_rule" "admin_ingress" {
  count             = length(var.allowed_admin_cidrs) > 0 ? 1 : 0
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  cidr_blocks       = var.allowed_admin_cidrs
  security_group_id = aws_security_group.app.id
}
```

## Example validation after the fix

```bash
checkov -d infra/
terraform plan
prowler aws --check aws_ec2_securitygroup_allow_ingress_from_internet_to_tcp_ports_22_3389
```

## Web UI how-to ideas for the lab

### AWS console path

1. Systems Manager → Automation.
2. Search for containment or quarantine runbook.
3. Review required parameters and assume role.
4. Execute against the target resource.
5. Save execution ID into the incident record.

### XSOAR path

1. open incident;
2. run enrichment tasks;
3. request manual approval if production risk exists;
4. execute containment task;
5. attach evidence and open engineering remediation ticket;
6. link the postmortem record.

## Common mistakes

* automating containment without an approval boundary for production systems;
* deleting or rebuilding assets before preserving evidence;
* fixing only the live resource and not the template or module;
* stopping at detection and never building the response automation.

## Cross-links

* [Detection and Response](/attack-paths-testing-detection-and-hardening/index.md)
* [Product Security Incident Response Playbooks](/attack-paths-testing-detection-and-hardening/index/product-security-incident-response-playbooks.md)
* [Runtime Investigation Playbook for Kubernetes and Containers](/cloud-kubernetes-and-infrastructure-security/index-1/runtime-investigation-playbook.md)
* [Cloud Compliance Scan Lab — Scan → Triage → Fix → Codify](/learning-labs-interview-and-templates/index-2/cloud-compliance-scan-lab-scan-triage-fix-codify.md)

\---*Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.product-security.expert/learning-labs-interview-and-templates/index-2/containment-and-eradication-automation-lab.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
