Use CaseJune 16, 2026·8 min read
ecs-audit-logecs-compliance-loggingaws-ecs-cloudtrail

Who Restarted Prod?
How to Find It in CloudTrail

Your ECS service restarted. Or a task was manually stopped. Or desiredCount dropped to zero and nobody admits it. The ECS console shows WHAT happened — not WHO. CloudTrail has the answer, and three CLI commands get you there in under two minutes.

Matt S
Matt S
Platform engineer · Fortem
TL;DR
  • 01CloudTrail captures every ECS API call — UpdateService, StopTask, RunTask, RegisterTaskDefinition — with who, when, and from where.
  • 02Event History is free for the last 90 days. Three CLI commands find the culprit in under 2 minutes.
  • 03The userIdentity field tells you human vs CI/CD vs AWS service. Root account activity in ECS is always suspicious.
  • 04Download the skill file — an AI agent runs the full fleet audit and produces a structured report automatically.

Why the ECS events tab doesn't tell you who did it

ECS events show WHAT happened — "service updated", "task stopped" — but not WHO. The userIdentity lives in CloudTrail, not in the ECS console. That's the gap most teams waste an hour trying to bridge.

You open the ECS service page. Under Events: "service my-api has started 1 tasks" at 14:23, "service my-api has stopped 1 running tasks" at 14:21. Something stopped your service and triggered a redeploy. The ECS console stops there — it doesn't record the API caller, the IAM identity, or whether it was a human clicking the console or Terraform applying a change.

ECS Events tabCloudTrail
Shows WHAT happenedShows WHO did it, WHEN, and FROM WHERE
Service-level messages onlyAll API calls including StopTask, UpdateService, RunTask
No API caller infouserIdentity: human, CI/CD role, or AWS service
Kept for a few hours90-day Event History, free
Not queryableSearchable by event name, username, resource, IP
Key insight
CloudTrail records every ECS API call automatically — no setup required. The 90-day Event History is free. You're not paying for it already; it's just there. The only thing missing is knowing where to look.

Three commands to find the culprit in under 2 minutes

aws cloudtrail lookup-events with AttributeKey=EventName filters to specific actions. Pipe through jq to extract userIdentity.userName, eventTime, and sourceIPAddress. Covers the last 90 days at no charge.

Find who stopped a task
bash
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=StopTask \
  --query 'Events[*].CloudTrailEvent' \
  --output text | \
jq -r '. | {
  time: .eventTime,
  who: (
    if .userIdentity.type == "IAMUser" then .userIdentity.userName
    elif .userIdentity.type == "AssumedRole" then .userIdentity.sessionContext.sessionIssuer.userName
    else .userIdentity.type
    end
  ),
  from: .sourceIPAddress,
  via: .userAgent,
  task: .requestParameters.task
}'
Find who updated a service (deployments, scale changes)
bash
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=UpdateService \
  --query 'Events[*].CloudTrailEvent' \
  --output text | \
jq -r '. | {
  time: .eventTime,
  who: (
    if .userIdentity.type == "IAMUser" then .userIdentity.userName
    elif .userIdentity.type == "AssumedRole" then .userIdentity.sessionContext.sessionIssuer.userName
    else .userIdentity.type
    end
  ),
  via: .userAgent,
  service: .requestParameters.service,
  desiredCount: .requestParameters.desiredCount
}'
Narrow by specific user or role
bash
# Find everything a specific IAM user did in the last 24h
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=Username,AttributeValue=john.smith \
  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -v-24H +%Y-%m-%dT%H:%M:%SZ) \
  --query 'Events[*].{Time:EventTime,Event:EventName}' \
  --output table
Rate limit: lookup-events is capped at 2 requests/second per account per region. If you're scripting across many event types, add a 0.5s sleep between calls or use --next-token for pagination. Max 50 events per request; paginate if you need more.

Which ECS events map to which actions

UpdateService = scale change or deployment. StopTask = manual kill. RegisterTaskDefinition = new image or config. RunTask = standalone task launch. Each has a different userIdentity pattern worth knowing.

ScenarioCloudTrail eventNameWho typically calls it
Service scaled up/downUpdateServiceHuman, CI/CD, autoscaler
Deployment triggeredUpdateService + RunTaskCI/CD pipeline
Task manually stoppedStopTaskHuman, script, ECS agent
New task definitionRegisterTaskDefinitionCI/CD pipeline, human
Service created/deletedCreateService / DeleteServiceHuman, Terraform
Cluster deletedDeleteClusterHuman, Terraform

The most ambiguous one is StopTask. It appears in CloudTrail when a human manually stops a task, when a script does it, and when ECS itself stops a task during a rolling deployment. Check userIdentity.invokedBy — if it says ecs.amazonaws.com, ECS triggered the stop internally during service orchestration, not a human.

Decoding userIdentity: human, CI/CD, or AWS service

userIdentity.type tells you who called the API: IAMUser = human, AssumedRole = CI/CD or Lambda, AWSService = autoscaler or ECS itself. Root type should never appear in ECS — alert immediately if it does.

userIdentity.typeMeaningHow to extract the name
IAMUserHuman with IAM credentials.userIdentity.userName
AssumedRoleCI/CD, Lambda, or human via role.userIdentity.sessionContext.sessionIssuer.userName
RootAWS root account — alert immediatelytype = Root is the signal
AWSServiceAWS-owned service (autoscaling, ECS agent).userIdentity.invokedBy
AWSAccountCross-account call from another AWS account.userIdentity.accountId
FederatedUserSSO / identity provider user.userIdentity.principalId

The tricky one is AssumedRole. When a GitHub Actions pipeline runs aws ecs update-service, the CloudTrail event shows type: AssumedRole and the ARN of the role. The human-readable role name is in sessionContext.sessionIssuer.userName. That's the field to surface in your audit report — not the full ARN.

To distinguish console vs CLI vs Terraform, use the userAgent field:

userAgent valueWhat called the API
console.amazonaws.comAWS console (someone clicked)
aws-cli/2.*AWS CLI (manual or script)
Terraform/1.* terraform-provider-aws/*Terraform apply
github-actions/*GitHub Actions CI/CD
ECS ConsoleECS service console actions
Key insight
If userIdentity.type is Root, stop everything else and investigate. Root credentials should never be used for routine ECS operations. A Root call in CloudTrail means either someone is using the root account directly (a security failure) or credentials were compromised.

Alerting in real time: EventBridge rule for critical ECS changes

EventBridge can trigger a notification within seconds of a StopTask or UpdateService call — before you notice the incident. One Terraform resource sets up the rule with no additional infrastructure.

Searching CloudTrail after an incident is reactive. EventBridge makes it proactive: you define a rule that matches specific CloudTrail events, and EventBridge triggers an SNS notification, Lambda, or Slack webhook immediately when the event occurs. For teams running 10+ ECS environments, catching a DeleteService before the on-call rotation starts saves significant incident response time.

Terraform: EventBridge rule for critical ECS events
hcl
resource "aws_cloudwatch_event_rule" "ecs_critical" {
  name        = "ecs-critical-changes"
  description = "Alert on destructive or suspicious ECS API calls"

  event_pattern = jsonencode({
    source      = ["aws.ecs"]
    detail-type = ["AWS API Call via CloudTrail"]
    detail = {
      eventSource = ["ecs.amazonaws.com"]
      eventName   = [
        "StopTask",
        "DeleteService",
        "DeleteCluster",
        "UpdateService"
      ]
    }
  })
}

resource "aws_cloudwatch_event_target" "ecs_critical_sns" {
  rule      = aws_cloudwatch_event_rule.ecs_critical.name
  target_id = "SendToSNS"
  arn       = aws_sns_topic.alerts.arn

  input_transformer {
    input_paths = {
      event   = "$.detail.eventName"
      who     = "$.detail.userIdentity.sessionContext.sessionIssuer.userName"
      time    = "$.time"
      service = "$.detail.requestParameters.service"
    }
    input_template = ""ECS alert: <event> on <service> by <who> at <time>""
  }
}

For UpdateService, add a second rule specifically for scale-to-zero: filter where requestParameters.desiredCount = 0. That's the most common accidental incident — someone running a cleanup script that hits the wrong environment.

The Oct 2025 addition: ECS CloudTrail data events

Since October 2025, ECS supports CloudTrail data events for ContainerInstance agent API activity (ecs:Poll, ecs:StartTelemetrySession). These aren't in Event History — they require a CloudTrail trail or CloudTrail Lake.

AWS management events (UpdateService, StopTask, etc.) are what most teams need for incident response. The October 2025 addition is different: ECS now supports CloudTrail data events for ContainerInstance agent API calls — the low-level polling activity between the ECS agent and the control plane.

Management eventsData events (Oct 2025)
What they captureUpdateService, StopTask, RunTask, etc.ecs:Poll, ecs:StartTelemetrySession, ecs:PutSystemLogEvents
CostFree (Event History)Additional CloudTrail charges
In Event History?Yes — 90 daysNo — trail or Lake required
Who needs themEveryone — incident responseEC2 launch type, compliance auditing
Resource typeAWS::ECS::ContainerInstance

For most ECS Fargate teams, data events aren't needed for incident response — management events cover UpdateService and StopTask which is where incidents come from. Data events matter if you run EC2 launch type and need to audit ContainerInstance registration activity, or if compliance requires a full record of agent-to-control-plane communication. Enable them only if you have a specific requirement — at scale, ContainerInstance polling events generate significant volume and cost. Details in the ECS CloudTrail logging docs.

Download the skill file — let the AI agent do the audit

The skill file instructs an AI agent to pull all critical ECS CloudTrail events from the last 24 hours across every cluster in your account and produce a structured "who did what" report. Read-only — no changes applied.

ECS CloudTrail Audit
Agent scans all clusters, pulls critical ECS events (UpdateService, StopTask, RunTask, RegisterTaskDefinition, DeleteService), decodes who called each one — human, CI/CD, or AWS service — and flags Root account activity and suspicious patterns.
Read-only· Runs locally· Last 90 days (free)
Drop into Claude Code, OpenCode, or Codex — the agent executes the steps

The agent lists all clusters, runs lookup-eventsfor each critical event type, decodes the userIdentity, and produces a structured output: "Service X was updated at HH:MM by role deploy-prod via GitHub Actions from IP 140.82.114.3." It also flags Root account activity, unexpected source IPs, and scale-to-zero incidents. For teams where "who did this?" is a recurring post-incident question, this is the 2-minute version of the 20-minute manual process.

"To identify the user who initiates a StopTask API call, view StopTask in AWS CloudTrail for userIdentity information."

AWS Knowledge Center: Troubleshoot running task count changes in ECS

FAQ

If you read this, you might also want to know

Can I search CloudTrail events older than 90 days?

Not with lookup-events — it only covers the last 90 days. For older events, you need a CloudTrail trail delivering to S3. Query the S3 bucket with Athena using the cloudtrail_logs partition table, or use CloudTrail Lake if you enabled it. Both options incur additional costs: S3 storage + Athena query costs, or CloudTrail Lake ingestion charges.

How do I tell if a change was made by Terraform vs a human?

Check the userAgent field in the CloudTrail event. Terraform calls show 'Terraform/1.x.x (+https://www.terraform.io) terraform-provider-aws/5.x.x'. A human via CLI shows 'aws-cli/2.x'. The AWS console shows 'console.amazonaws.com'. This works even when both Terraform and a human share the same IAM role — the userAgent tells them apart.

What if the ECS event was triggered by autoscaling — does it show in CloudTrail?

Yes — Application Auto Scaling calling UpdateService appears in CloudTrail with userIdentity.type = AWSService and invokedBy = application-autoscaling.amazonaws.com. You can distinguish autoscaling actions from human actions by filtering on invokedBy. This is important when investigating 'who scaled my service' — it might be auto scaling doing its job, not a person.

Can I set up a CloudTrail alert that fires before the on-call gets paged?

Yes — the EventBridge approach in section 5 fires within seconds of the CloudTrail event, which is typically 1-2 minutes after the API call. EventBridge → SNS → PagerDuty (or directly to your alerting platform) gives you a notification before the monitoring system catches the downstream effects. For DeleteService or scale-to-zero, this is the difference between 1-minute and 5-minute detection.

Need an audit trail across your whole fleet?

Stop hunting CloudTrail
at 2am.

Book 20 minutes — we'll show you what Fortem surfaces across your ECS environments so you know who changed what before your monitoring even fires.

Worth reading