How Do You Find and Kill Orphaned ECS Environments Before They Drain Your Budget?
Every team with 10+ ECS environments has at least one nobody uses anymore. The Fargate tasks stopped when the feature shipped — or didn't. But the ALB kept running. The NAT Gateway kept running. Six months later you're looking at a $400 line item on the bill and nobody can explain it.
- ·A stopped ECS environment (desired=0) still costs $48–65/mo in ALB + NAT Gateway overhead.
- ·Fargate is honest — it bills $0 when desired=0. ALB and NAT Gateway don't know and don't care.
- ·3 CLI commands surface every orphan in your account in under 5 minutes.
- ·Kill order matters: tasks → service → target group → ALB → NAT Gateway → log groups.
- ·5 forgotten environments = ~$3,900/year in pure waste, no compute running.
Find every ECS service at desired=0 across all clusters in the current AWS account:
# List all clusters
aws ecs list-clusters --query 'clusterArns[]' --output text | tr '\t' '\n' \
| while read cluster; do
echo "=== $cluster ==="
aws ecs list-services --cluster "$cluster" \
--query 'serviceArns[]' --output text | tr '\t' '\n' \
| xargs -r -P4 -I{} aws ecs describe-services \
--cluster "$cluster" --services {} \
--query 'services[?desiredCount==`0`].[serviceName,desiredCount,runningCount]' \
--output table
doneRequires: AWS CLI v2, credentials with ecs:ListClusters, ecs:ListServices, ecs:DescribeServices
What makes an ECS environment orphaned
An ECS environment is orphaned when its desired count hits 0 but the supporting infrastructure — ALB, NAT Gateway, log groups — keeps running and billing.
Three patterns cause this. The first is the feature branch that shipped (or got cancelled): someone set desiredCount=0 to "pause" the environment, meant to delete it later, and never did. The ECS console shows 0/0 tasks — looks fine, no alarms fire, nobody notices.
The second is the deprecated microservice. The team migrated to a new service, pointed traffic at it, and left the old one running at zero. It still has an ALB. It still has a NAT Gateway routing its (nonexistent) outbound traffic. The Terraform state still references it.
The third pattern is specific to EC2-backed ECS clusters: an instance fails to register with the cluster — misconfigured IAM role, broken ECS agent, VPC networking issue — and sits in the Auto Scaling group in a healthy state while ECS has no idea it exists. AWS's own documentation describes it: "the instance will just sit there, idling along doing nothing in an unregistered orphaned state."
All three share the same symptom: the ECS console looks clean. No errors. No alerts. Just a steady, invisible charge on the monthly bill.
The real cost of a dead environment
One orphaned Fargate environment with zero running tasks costs $48–65/month: ALB $16.43 + NAT Gateway $32.40 + CloudWatch log storage. No compute, but the infrastructure meter runs.
Fargate compute: $0.00 — desired count is 0, no tasks run. The infrastructure doesn't care.
Fargate is honest — it bills $0 when desiredCount is 0, because no tasks are running. ALB and NAT Gateway aren't connected to ECS service state. They bill by the hour, unconditionally. An environment at zero is indistinguishable from an environment at 100 tasks as far as those services are concerned.
The ALB base rate is $0.0225/hr (verified June 2026) — $16.43/month whether or not a single request passes through it. NAT Gateway is $0.045/hr (verified June 2026) — $32.40/month per AZ. If your environment spans two AZs, that's $64.80/month just in NAT Gateway overhead.
At 5 forgotten environments, that's $3,900/year in pure infrastructure waste. No compute. No traffic. No one using it.
The number teams miss when auditing is also the fixed overhead per environment that persists regardless of task count. An environment costs money from the moment you create the ALB and NAT Gateway — not from the moment tasks start running.
How to find orphaned environments
Three AWS CLI commands surface every ECS service with zero desired count and their associated infrastructure across all clusters — no third-party tools, no console clicking.
Command 1 — find all zero-desired services. The script in the ready-to-use block above lists every service at desiredCount=0. Run it in each region you use. Filter by cluster name to narrow the scope.
Command 2 — find ALBs with no healthy targets. A stopped environment's target group shows 0 healthy targets. This is the fastest way to cross-reference which ALBs are attached to dead environments:
aws elbv2 describe-target-groups --query \
'TargetGroups[*].[TargetGroupName,TargetGroupArn]' --output text \
| while read name arn; do
health=$(aws elbv2 describe-target-health --target-group-arn "$arn" \
--query 'TargetHealthDescriptions[?TargetHealth.State==`healthy`] | length(@)')
echo "$health healthy $name"
done | sort -nTarget groups with 0 healthy targets are candidates for deletion — but check Command 3 first before acting.
Command 3 — find log groups with no recent writes. CloudWatch log groups that haven't received a write in 30+ days are orphaned log storage. They cost $0.50/GB/month to store and accumulate silently:
# Find log groups with no writes in the last 30 days
aws cloudwatch get-metric-statistics \
--namespace AWS/Logs \
--metric-name IncomingLogEvents \
--dimensions Name=LogGroupName,Value=/ecs/your-service \
--start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 2592000 \
--statistics Sum \
--query 'Datapoints[0].Sum'A return value of null or 0.0 means the log group is dead. Automate this check across all /ecs/* log groups to build a full orphan list.
What to check before you delete
Before deleting any environment, verify three things: no scheduled job points at it, no CI/CD pipeline references the cluster name, and the ALB isn't shared between multiple services.
EventBridge scheduled rules. Nightly jobs, weekly reports, scheduled ECS tasks — all reference a cluster and service by name. Check for rules targeting your environment before deleting:
aws events list-rules --query 'Rules[*].[Name,ScheduleExpression,State]' --output table
# Then for each relevant rule:
aws events list-targets-by-rule --rule <rule-name> --query 'Targets[*].EcsParameters'Terraform state. If the environment was created with Terraform, its state file still references the service, cluster, ALB, and target group. Deleting resources manually without running terraform destroy first will leave Terraform in a broken state on the next plan. Either run terraform destroy -target per resource or remove the state entries manually with terraform state rm.
Shared ALBs. Some teams route multiple environments through a single ALB using listener rules and host-based routing. Check whether your ALB has multiple listener rules before deleting it:
aws elbv2 describe-listeners --load-balancer-arn <alb-arn> \
--query 'Listeners[*].ListenerArn' --output text \
| xargs -I{} aws elbv2 describe-rules --listener-arn {} \
--query 'Rules[*].[Priority,Conditions[0].Values[0]]' --output tableIf only one rule exists (the default forward rule), the ALB is dedicated to this environment and safe to delete. Multiple rules mean other services depend on it — remove only the rules and target groups belonging to the orphaned service, leave the ALB intact.
Also check the CloudTrail audit log to see who last touched the environment — and when. An environment last modified 8 months ago by a developer who left the company is safe to kill. One touched last week by a CI/CD pipeline is not.
Kill order: the right sequence
Delete in this order: set desiredCount=0 → drain tasks → delete ECS service → delete target group → delete ALB listener rule → delete ALB → delete NAT Gateway → delete log groups. Wrong order causes dependency errors and leaves billing running.
aws ecs update-service --cluster <cluster> --service <service> --desired-count 0
# Lower drain timeout first to avoid waiting 5 minutes:
aws elbv2 modify-target-group-attributes \
--target-group-arn <tg-arn> \
--attributes Key=deregistration_delay.timeout_seconds,Value=30aws ecs delete-service --cluster <cluster> --service <service> --force# Remove listener rules first, then target group, then ALB
aws elbv2 delete-rule --rule-arn <rule-arn>
aws elbv2 delete-target-group --target-group-arn <tg-arn>
# Disable deletion protection if set:
aws elbv2 modify-load-balancer-attributes \
--load-balancer-arn <alb-arn> \
--attributes Key=deletion_protection.enabled,Value=false
aws elbv2 delete-load-balancer --load-balancer-arn <alb-arn>aws ec2 delete-nat-gateway --nat-gateway-id <ngw-id>
# Wait for deletion, then release the Elastic IP:
aws ec2 release-address --allocation-id <eip-alloc-id># List and delete all log groups for this service:
aws logs describe-log-groups --log-group-name-prefix /ecs/<service-name> \
--query 'logGroups[*].logGroupName' --output text \
| tr '\t' '\n' \
| xargs -I{} aws logs delete-log-group --log-group-name {}Deleting an ALB while its listener rules still reference target groups throws a dependency error. Always delete rules before target groups, and target groups before the ALB. If you see ResourceInUse, run the listener rules describe command above to find what's still attached.
How to prevent orphans from accumulating
Tag every environment at creation with owner, created-by, and ttl. A weekly Lambda that flags services where TTL has passed costs $0 to run and surfaces every stale environment before it accumulates 6 months of charges.
Tagging convention. Apply these tags to every ECS service, ALB, target group, and NAT Gateway at creation time. Without consistent tags, the audit script above has no way to determine ownership or expected lifetime:
# Terraform example — tag every resource at creation
locals {
env_tags = {
owner = "platform-team"
created-by = "terraform"
env-type = "staging" # feature | staging | prod
ttl = "2026-09-01" # ISO date — when this env expires
service = "payments-v2"
}
}
resource "aws_ecs_service" "this" {
# ...
tags = local.env_tags
}
resource "aws_lb" "this" {
# ...
tags = local.env_tags
}Weekly janitor Lambda. An EventBridge rule triggers a Lambda every Monday. The Lambda lists all ECS services, checks the ttl tag against today's date, and posts a Slack message for every service that's past its TTL or has been at desiredCount=0 for more than 7 days. No auto-deletion — just surfacing. The team decides what to kill.
Fortem does this automatically: the dashboard shows per-environment cost, flags services that have been at zero desired count for more than N days, and lets you kill them from the UI without touching the AWS console. For teams managing 20+ environments, the manual audit above gets expensive in engineer time quickly.
If you read this, you might also want to know
What if my orphaned environment is in a different AWS account?
Run the same CLI commands with --profile <account-profile>. If you use AWS Organizations, the easiest cross-account audit is AWS Config aggregator — it surfaces resources tagged with ttl across all member accounts without logging into each one.
Does deleting an ECS service also delete the underlying ECR images?
No. ECR images are independent of ECS services. Deleting the service leaves all images in ECR intact. Images cost $0.10/GB/month to store — a separate cleanup. Use aws ecr describe-images --repository-name <repo> to list images and aws ecr batch-delete-image to remove old ones.
How do I know if an ALB is shared between multiple ECS environments?
Check listener rules: aws elbv2 describe-rules --listener-arn <arn>. More than one non-default rule means multiple services share the ALB. Count the target groups attached — one per environment. Delete only the rules and target group for the orphaned service, leave the ALB.
Common questions
Stop guessing which environments are costing you money
Fortem shows per-environment cost and flags orphaned environments automatically. Book 20 minutes — we'll run the audit on your fleet live.
Response within 4 hours, weekdays.