Why won't my ECS Fargate service scale in?

Three common reasons. (1) Scale-in is conservative by design — the managed alarm typically needs ~15 consecutive minutes below the threshold before removing tasks, versus ~3 minutes to scale out. (2) Application Auto Scaling turns off scale-in entirely while an ECS deployment is in progress. (3) Target tracking treats insufficient metric data as 'do not scale in', so a service with gaps in its metric never scales down. If you need faster or asymmetric scale-in, disable scale-in on the target tracking policy and add a custom step scaling policy.

What's the difference between target tracking and step scaling for ECS?

Target tracking keeps a metric at a target value (like CPU at 50%) and AWS creates and manages the CloudWatch alarms for you — it's the easiest mode and the right default for steady load. Step scaling defines explicit alarm thresholds and how many tasks to add or remove at each breach level, so it reacts faster to sudden spikes. They can coexist: target tracking for steady state, a step policy for bursts.

Can ECS Fargate autoscale on ALB request count?

Yes — ALBRequestCountPerTarget is one of the three predefined target tracking metrics, alongside ECSServiceAverageCPUUtilization and ECSServiceAverageMemoryUtilization. It's often the best signal for request-driven services because it scales on actual load, not on a proxy like CPU. One caveat: ALBRequestCountPerTarget is not supported for the blue/green deployment type.

What cooldown should I use for ECS autoscaling?

Sensible defaults: a short scale-out cooldown (~60 seconds) to stay responsive, and a longer scale-in cooldown (~300 seconds) to prevent thrashing — tasks being added and removed repeatedly. Pair that with a CPU target around 50%, not 80%: too high a target leaves no headroom for new tasks to warm up before the metric spikes again.

Does ECS autoscaling work during a deployment?

Partly. Application Auto Scaling turns off scale-in processes while an ECS deployment is in progress, but scale-out continues unless you suspend it. So a service can still add tasks under load mid-deploy, but won't remove them until the deployment finishes. This does not apply to services using an external deployment controller.

Guide

Matt S

Platform engineer at FortemJune 25, 20269 min read

ecs-fargate-autoscalingecs-target-tracking-scalingecs-step-scaling-vs-target-tracking

ECS Fargate Autoscaling: Target Tracking, Step, and Why It Doesn't Scale When You Expect

You set a CPU target, autoscaling “works” — until a traffic spike it reacts to too slowly, or a service that quiets down won't scale back in. ECS dynamic scaling follows rules most tutorials skip. This guide covers the three policy types, the settings that matter, and the five reasons it doesn't scale when you expect — each backed by the AWS docs, not by “set the target to 50% and hope.”

TL;DR

·Target tracking is the right default: pick one metric (CPU, memory, or ALB requests per target), set one target, and AWS creates and manages the alarms.
·It scales out fast and in slow on purpose — the managed alarms need ~3 minutes above target to add tasks, ~15 minutes below to remove them.
·Five things break it: scale-in is off during deployments, ALB request count isn't supported on blue/green, editing the managed alarms, too-slow reaction to spikes, and thrashing from a short scale-in cooldown.
·When target tracking is too slow for bursts, add a step scaling policy for the spike and keep target tracking for steady state — they coexist.

Quick answer

For most ECS Fargate services, use target tracking on CPU at 50% (or ALBRequestCountPerTarget for request-driven apps), with a ~60s scale-out and ~300s scale-in cooldown.AWS creates and manages the CloudWatch alarms — don't edit them. Scale-out happens after ~3 minutes above target; scale-in after ~15 minutes below, and it's turned off entirely during a deployment. If that's too slow for sudden spikes, add a step scaling policy on a steeper alarm and keep target tracking for steady state.

Ready to use — copy this today

Target tracking on CPU at 50% for one Fargate service, with sensible cooldowns. Register a scalable target, attach the policy, set min/max. Drop it into your Terraform and the service scales itself.

hcl

# Register the ECS service as a scalable target
resource "aws_appautoscaling_target" "svc" {
  service_namespace  = "ecs"
  resource_id        = "service/${var.cluster}/${var.service}"
  scalable_dimension = "ecs:service:DesiredCount"
  min_capacity       = 2     # floor — never below this
  max_capacity       = 20    # ceiling — caps your worst-case cost
}

# Target tracking on average CPU at 50%
resource "aws_appautoscaling_policy" "cpu" {
  name               = "${var.service}-cpu-target"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.svc.service_namespace
  resource_id        = aws_appautoscaling_target.svc.resource_id
  scalable_dimension = aws_appautoscaling_target.svc.scalable_dimension

  target_tracking_scaling_policy_configuration {
    target_value = 50.0
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    scale_out_cooldown = 60    # add tasks quickly
    scale_in_cooldown  = 300   # remove tasks slowly — avoids thrashing
  }
}

For request-driven services, swap the metric for ALBRequestCountPerTarget with a resource_labelpointing at your ALB target group. Don't hand-edit the CloudWatch alarms this creates — AWS manages them.

The three ways ECS scales (and which to use)

ECS has three modes: target tracking (hold a metric at a target), step scaling (tiered per-alarm adjustments), and scheduled (calendar). For load, target tracking is the default; step handles bursts.

All three run on AWS Application Auto Scaling, which adjusts your service's desired task count. They answer different questions. Target tracking asks “keep this metric here.” Step scaling asks “when this alarm breaks by this much, add this many tasks.” Scheduled asks “at this time, set capacity to this.”

Policy	What it does	When to use
Target tracking	Keep a metric (CPU, memory, ALB requests) at one target value	Steady load — the default
Step scaling	Tiered task adjustments per alarm-breach size	Sudden spikes, custom thresholds
Scheduled	Set capacity by date/time (cron)	Known calendar patterns — covered separately

This guide is about scaling to load — target tracking and step. Scheduled scaling is a different job: turning environments off on a calendar to cut idle spend. If that's what you're after, the full mechanics of scheduled scaling to stop environments off-hours live in their own guide — we won't repeat them here.

Target tracking — the default, and its three metrics

Target tracking holds one of three predefined metrics at a target — CPU, memory, or ALB requests per target — and AWS creates and manages the CloudWatch alarms. You set the number; it does the rest.

It works like a thermostat. You pick a number; the auto scaler adds or removes tasks to keep the metric near it. The three metrics fit different services:

CPUECSServiceAverageCPUUtilization — the safe default for compute-bound services. Works everywhere, but it's a proxy: CPU can sit low while the service is still slow on I/O.
MEMECSServiceAverageMemoryUtilization — for memory-bound workloads. Risky as a sole metric: many apps hold memory flat and never trigger a scale-in.
ALBALBRequestCountPerTarget — the best signal for request-driven APIs. It scales on actual load, not a proxy. Caveat below.

The big convenience: target tracking removes the need to define alarms by hand. AWS builds two — a high alarm to scale out and a low alarm to scale in — and tunes them as load shifts.

Key insight

Do not edit or delete the CloudWatch alarms that target tracking creates. Service Auto Scaling owns them — it adjusts them as your load changes and deletes them when you delete the policy. Hand-editing them looks fine until the next adjustment silently reverts your change — and then scaling misbehaves with no obvious cause.

Why it doesn't scale when you expect (5 failure modes)

Five reasons: scale-in is blocked during deployments, the scale-out/scale-in timing asymmetry, ALB request count isn't supported on blue/green, insufficient data never scales in, and editing the managed alarms breaks it.

Most autoscaling problems aren't bugs — they're documented behavior that surprises you at the wrong moment. Here's the catalog, with the symptom you'll see, the cause, and the fix.

1My service won't scale in

Cause — Scale-in is conservative by design. The managed low alarm typically needs ~15 consecutive minutes below the threshold before removing tasks, while scale-out fires after ~3 minutes above. So a service that quiets down still runs extra tasks for a quarter of an hour.

Fix — Accept it for steady services, or — if you need faster, asymmetric scale-in — disable scale-in on the target tracking policy and add a custom step scaling policy with your own thresholds. Step scaling trades away some of target tracking's churn protection for control.

2Nothing scaled during my deployment

Cause — Application Auto Scaling turns off scale-in while an ECS deployment is in progress. Scale-out still happens (unless suspended), but tasks added under load mid-deploy won't be removed until the deployment finishes.

Fix — Expected behavior — let the deployment finish, scaling resumes after. If you also want to suspend scale-out during deploys, set DynamicScalingOutSuspended on the scalable target, then clear it when the deploy completes.

3ALBRequestCountPerTarget scaling does nothing on blue/green

Cause — ALBRequestCountPerTarget is not supported for the blue/green deployment type. The policy exists but never drives scaling.

Fix — Use CPU or memory target tracking on blue/green services, or scale on request count only on rolling-update services. Don't mix the unsupported metric with blue/green and assume it works.

4A service with spiky metrics never scales in

Cause — Target tracking does not scale in on insufficient data — it refuses to read missing datapoints as 'low utilization', to protect availability. A service with gaps in its metric stream stays at its current task count.

Fix — Make sure the metric reports continuously (a healthy service emits CPU/memory every minute). For request count, ensure the ALB target group is receiving traffic the policy can read.

5Scaling went weird after someone 'fixed' an alarm

Cause — Someone hand-edited the CloudWatch alarm target tracking manages. The next automatic adjustment reverts or conflicts with the change, and scaling behaves unpredictably.

Fix — Never touch the managed alarms. Change behavior through the policy (target value, cooldowns) instead. If you need custom alarm logic, use step scaling, where you own the alarms outright.

Cooldowns and thrashing — the settings that matter

Scale-out cooldown ~60s keeps you responsive; scale-in ~300s prevents thrashing. Too short a scale-in cooldown thrashes tasks; a CPU target too high (80%) leaves no headroom to warm up.

The cooldown is how long Service Auto Scaling waits for a scaling action to take effect before doing more. The two directions want different values, for different reasons.

Setting	Default	Why
Metric + target	CPU at 50% (or ALB requests/target)	Leave headroom for new tasks to warm up
Scale-out cooldown	~60 sec	Stay responsive under rising load
Scale-in cooldown	~300 sec	Prevent thrashing on dips
Min / max tasks	Set both deliberately	Max caps cost; min holds a floor

Why asymmetric: scaling out should be quick — under rising load you want capacity now, so a short ~60s cooldown is fine. Scaling in should be slow — pull tasks too eagerly and a brief dip removes capacity you need 90 seconds later, so the service adds it back, then removes it again. That cycle is thrashing, and a ~300s scale-in cooldown is what stops it.

On the target value:50% is a sane default, not 80%. A high target means tasks only get added once the service is already near saturation — and new Fargate tasks take 30–90 seconds to start and warm up. By the time they're ready, the spike has already hurt latency. Lower target, more headroom, smoother scaling.

When target tracking is too slow: add step scaling

Target tracking reacts on ~3-minute datapoints, too slow for sudden spikes. Add a step scaling policy on a steeper alarm to jump capacity fast; keep target tracking for steady state — they coexist.

Target tracking is smooth but deliberate. For a service that goes from quiet to flooded in seconds — a flash sale, a batch kickoff, an SQS backlog — three-minute datapoints mean you're already dropping requests before it reacts. Step scaling fixes that: you define explicit thresholds (“CPU over 70% → add 4 tasks; over 90% → add 8”) and it jumps capacity the moment the alarm breaks.

You don't have to choose. A service can run both: target tracking for the steady baseline and a step policy for the spike. When you have multiple policies, Service Auto Scaling prioritizes availability — it scales out if any policy says to, and scales in only if all of them agree. So the aggressive step policy can add capacity fast without the cautious target policy ever fighting it.

Key insight

The tradeoff with step scaling: you own the alarms, which means you also own the churn. Target tracking has built-in protections against rapid up-down cycling; step scaling does not. Use step for the burst, keep target tracking carrying the steady state, and you get fast reaction without hand-managing thrash control.

What this looks like across a fleet

One service's autoscaling is a Terraform block. At 10+ services across environments, you maintain scalable targets, policies, and per-service tuning — a surface that grows with every environment.

Autoscaling on one service is easy. The problem is multiplication. Each service needs its own scalable target, its own metric choice, its own cooldowns, and its own min/max — and the right values differ by service and by environment. A dev environment shouldn't scale to 20 tasks; production shouldn't cap at 4. Keeping that tuned by hand across a fleet is the work nobody budgets for.

It compounds with the costs you're already carrying. Autoscaling controls compute, but every environment also pays the fixed overhead each environment already carries — ALB, NAT Gateway, CloudWatch — which autoscaling can't touch. Scaling well is only part of running a fleet economically.

Fortem doesn't replace autoscaling — your policies keep doing their job. It gives you one place to see and tune scaling, scheduling, and cost across every ECS environment, so per-service drift doesn't pile up as your fleet grows.

If you read this, you might also want to know

Should I scale on CPU or memory?

CPU is the safer default — most services are compute-bound, and memory often holds flat (so it never triggers scale-in). Use memory only when you know the service is memory-bound, and even then pair it with a CPU or request-count policy so the service can still scale down. For request-driven APIs, ALBRequestCountPerTarget beats both — it scales on real load, not a proxy.

Can ECS Fargate scale to zero?

Yes, with a target tracking policy and min_capacity = 0. When capacity is 0 and the metric shows demand, Service Auto Scaling waits for one datapoint, scales out by the minimum amount, then resumes normal scaling from the actual running count. It's useful for spiky non-prod or batch services — but cold-start latency on the first request after zero is the tradeoff.

Does autoscaling fight my manual desired-count changes?

Yes. As long as an active scaling policy and alarm exist on the service, Service Auto Scaling can override a desired count you set by hand. If you need to pin capacity temporarily — say, during an incident — suspend scaling on the scalable target rather than fighting it with manual updates.

Common questions

Running autoscaling across 10+ ECS environments?

Per-service scaling drift is the work nobody budgets for. Fortem gives you one place to see and tune scaling, scheduling, and cost across every ECS environment. 20 minutes, no Terraform changes.

Book a 20-min call →

Response within 4 hours, weekdays.

Worth reading

LandingECS Environment SchedulingThe other half of scaling: stop non-prod environments off-hours on a calendar. Every scheduling approach and what breaks at fleet scale.GuideECS Fargate Best Practices: Running a Fleet of 10+ EnvironmentsNaming, fixed overhead, retention, Spot, quota isolation — the checklist for teams past ten environments, with real numbers.