Guide
Matt S
Matt S
Platform engineer at Fortem··9 min read
ecs-service-connectaws-ecs-service-connectecs-service-mesh

What Is ECS Service Connect and Should You Use It?

TL;DR
  • ·ECS Service Connect injects an Envoy proxy into every task automatically. No task definition changes — ECS manages the sidecar.
  • ·The feature is free. The cost is the Envoy tax: AWS recommends budgeting +0.25 vCPU and +64 MiB per task on Fargate.
  • ·Native blue/green (launched July 2025) works with Service Connect. The older CodeDeploy-based blue/green controller still does not.
  • ·Under 5 services or stuck on CodeDeploy: use Cloud Map. 10+ services on native deploys: Service Connect. External traffic: ALB.

Service Connect launched in 2022. It spent two years blocked by a CodeDeploy incompatibility that sent most teams back to plain Cloud Map. That blocker was fixed on July 17, 2025. Here's what Service Connect actually does, what it costs on Fargate, and the three situations where you should still skip it.

Ready to use — Terraform Service Connect configuration
hcl
resource "aws_ecs_service" "api" {
  name            = "api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.api.arn
  desired_count   = 3
  launch_type     = "FARGATE"

  service_connect_configuration {
    enabled   = true
    namespace = "arn:aws:servicediscovery:us-east-1:123456789:namespace/ns-xxx"

    service {
      port_name = "http"  # must match portMappings[].name in task definition

      client_alias {
        port     = 8080
        dns_name = "api"  # other services call: http://api:8080
      }
    }
  }

  network_configuration { ... }
}

# In task definition portMappings, name the port:
# { "name": "http", "containerPort": 8080, "appProtocol": "http2" }

# Account for the Envoy sidecar in your task CPU/memory:
# If your app needs 512 CPU / 1024 MiB, set task to 768 CPU / 1088 MiB

What ECS Service Connect actually does

ECS injects an Envoy proxy sidecar into every task automatically. Apps call other services by short name (http://checkout:8080); the proxy routes, load-balances, and health-checks in seconds — not minutes.

The AWS docs are explicit about how the sidecar works: "This container isn't present in the task definition and you can't configure it. Amazon ECS manages the container configuration in the service." You don't add anything to your task definition (except naming the port mapping). ECS handles the rest at deploy time.

Short-name routing. Services call each other by a name you define in the client_alias.dns_name field — http://checkout:8080, not an ALB DNS string or VPC Route 53 entry. The proxy resolves the endpoint from the Cloud Map namespace and load-balances across all healthy task IPs in round-robin.

Passive health checks. The proxy marks individual tasks unhealthy and stops routing to them within seconds of detecting failures — no health check endpoint required on the calling service. This is the primary reliability advantage over plain Cloud Map DNS, where stale records can persist for the full DNS TTL (typically 15–60 seconds) after a task stops.

Automatic CloudWatch metrics. Request count, HTTP 4xx/5xx rates, and latency (p50, p99) are emitted per service pair without any instrumentation in your app. If your appProtocol is tcp, you get proxy activity but no per-call telemetry — only HTTP/1.1, HTTP/2, and gRPC get the full metric set.

Namespace scope. Service Connect manages its own namespace — it does not write to VPC DNS or Route 53. Short names are only resolvable from inside tasks enrolled in the same namespace. A Lambda function, an EC2 instance, or an ECS service not enrolled in that namespace cannot resolve those names.

Service Connect vs Cloud Map vs internal ALB — which is which

Cloud Map is DNS-based discovery — cheap, simple, slow to fail over. Service Connect adds a proxy for sub-10s failover and per-call metrics. Internal ALBs are for external ingress or non-ECS callers. Most fleets need all three.

The full picture of ECS service-to-service options — including when Cloud Map still wins — is in the ECS service discovery decision guide. This article focuses specifically on Service Connect: what it costs, what it can't do, and when to use it.

DimensionService ConnectCloud MapInternal ALB
Failover speedSeconds (proxy detects)DNS TTL — 15–60s staleSeconds (connection drain)
Feature cost$0$0.10/resource/month$0.0225/ALB-hour (~$16/mo)
Automatic retries2 retries on failureNone — app-level onlyNone — app-level only
Per-call metricsBuilt-in (HTTP/gRPC)NoneALB access logs
Non-ECS callersNoYesYes
Protocol telemetryHTTP/1.1, HTTP/2, gRPCn/aHTTP only (L7 log)

When to use an internal ALB alongside Service Connect. Lambda functions, EC2 instances, on-prem services via Direct Connect — anything not enrolled in the Service Connect namespace needs a stable HTTPS endpoint. That's the ALB's job. You can run Service Connect for ECS-to-ECS traffic and an internal ALB for inbound calls from outside ECS simultaneously on the same service.

When Cloud Map still wins over Service Connect. If your services have more than 1,000 tasks (Cloud Map hits a Route 53 quota at that scale; Service Connect does not), or if you need cross-account service discovery without AWS RAM, Cloud Map is the cleaner path. And if you have non-ECS callers anywhere in your stack, Cloud Map DNS is the only registry they can reach without a separate ALB.

What Service Connect costs — the feature is free, the sidecar isn't

Service Connect has no feature charge. The cost is the Envoy sidecar: AWS recommends +0.25 vCPU and +64 MiB per task. On a 0.25 vCPU task, that doubles the CPU line item.

AWS's official recommendation is to reserve 256 CPU units (0.25 vCPU) and 64 MiB for the sidecar per task. Bump that to 512 CPU units if your service handles more than 500 requests per second at peak, and to 128 MiB if your cluster runs more than 100 Service Connect services or more than 2,000 tasks total. These are minimums, not guarantees — monitor actual sidecar container metrics in CloudWatch after enabling.

Fleet math for a typical 10-service setup, 3 tasks per service, running 24/7 in us-east-1:

ResourceCalculation$/month
Sidecar CPU30 tasks × 0.25 vCPU × $0.04048/hr × 730 hrs~$221
Sidecar memory30 tasks × 0.064 GB × $0.004445/GB-hr × 730 hrs~$6
Service Connect feature$0
Total Envoy tax (10 services × 3 tasks)~$227/mo

The ratio matters more than the absolute number. On a 2 vCPU task, the sidecar adds ~12% to the CPU line — negligible. On a 0.25 vCPU task, the sidecar doubles it. If your services are right-sized to small task sizes, run the math before enabling fleet-wide. The full breakdown of Fargate task pricing is in the AWS Fargate pricing breakdown.

Key insight

The Envoy sidecar is the cost. Run the math on your actual task sizes before rolling Service Connect out fleet-wide. A 0.25 vCPU task doubles its CPU cost. A 1 vCPU task adds 25%. The break-even depends on how much your team values built-in retries and per-call CloudWatch metrics versus paying for those resources.

Cloud Map fees when using Service Connect. ECS creates the Cloud Map namespace and registers service instances automatically when you enable Service Connect. Standard Cloud Map fees still apply — $0.10/registered resource/month for each service instance ECS registers. On a 10-service fleet with 3 tasks each, that is 30 registered resources × $0.10 = $3/month on top of the Envoy sidecar compute. It is a small line item but not zero.

mTLS adds cost. Service Connect supports mutual TLS via AWS Private CA. Certificates rotate every 5 days — roughly 6 rotations per service per month. Factor in Private CA per-certificate pricing if you plan to enable mTLS. Without it, traffic between tasks is unencrypted at the proxy level (VPC provides network isolation, but not encryption in transit).

The July 2025 blue/green unblock — and the CodeDeploy trap that remains

ECS native blue/green (July 17, 2025) supports Service Connect. The old CODE_DEPLOY deployment controller still does not — and throws a hard error at deploy time if you combine them.

"DeploymentController#type CODE_DEPLOY is not supported by ECS Service Connect. I ended up going with Cloud Map-based Service Discovery — which initially felt like the 'old way' of doing things."

Dev.to, AWS Builders, 2024 (CodeDeploy limitation — still accurate; blue/green via native controller now unblocked)

The dev.to article above drove a lot of teams away from Service Connect. Its conclusion — "use Cloud Map instead" — was correct at the time. The author hit a real, hard error. But the central reason for that conclusion changed on July 17, 2025, when AWS launched built-in blue/green deployments for ECS without a CodeDeploy dependency. Service Connect is explicitly supported in the new native blue/green controller.

What changed. ECS now has three deployment controllers: ROLLING (default), ECS (native blue/green, launched July 2025), and CODE_DEPLOY (legacy). Service Connect works with ROLLING and ECS. It still fails with CODE_DEPLOY.

How to check your controller. In Terraform, look for deployment_controller { type = "CODE_DEPLOY" } in your aws_ecs_service resources. In the AWS console: ECS → Clusters → your cluster → Services → select a service → Configuration tab → Deployment type. If it says "Blue/green deployment (powered by AWS CodeDeploy)", you're on the legacy controller.

The migration path. Switch from CODE_DEPLOY to ECS native blue/green before enabling Service Connect. The blue/green deployment guide covers that migration in detail — including the differences in traffic routing, test header behavior, and rollback mechanics.

Key insight

If you're on CodeDeploy and want Service Connect, you have two options: migrate to the ECS native blue/green controller (recommended), or stay on Cloud Map for service discovery until the migration is done. Enabling Service Connect on a CodeDeploy service fails at deploy time with a hard error — not a warning. Budget the controller migration as a prerequisite.

Gotchas nobody warns you about

Envoy sidecar memory can grow unbounded with gRPC traffic. appProtocol is immutable after service creation. HTTP/1.0 is not supported. Windows containers and standalone tasks don't work with Service Connect.

gRPC memory growth. There's a documented re:Post thread from August 2024 where teams noticed the Service Connect proxy container's memory growing without bound after redeploys with gRPC workloads. The app container's memory released normally — only the sidecar kept climbing until gRPC traffic eventually stalled. AWS's workaround: add a task-level memory limit (not just container-level) and over-provision sidecar memory. Also, pin a recent ecs-service-connect-agent version — CVE-2024-34364 (Envoy mirror-response unbounded buffer allocation) can manifest as a memory leak in older agent releases. Monitor sidecar container memory in CloudWatch if you're serving gRPC.

Immutable appProtocol. You set appProtocol in your port mapping configuration — http, http2, grpc, or tcp. You cannot change it after the service is created. If you start with http and later migrate to gRPC, you must delete and recreate the ECS service. Plan the protocol decision before your first deploy — this is the single most operationally painful gotcha in Service Connect.

HTTP/1.0 not supported. The Envoy proxy drops HTTP/1.0 traffic. Most modern clients speak HTTP/1.1 or higher, but legacy internal tools occasionally use 1.0. Verify your client HTTP version before enabling Service Connect on any service that receives internal API calls from older tooling.

Namespace cleanup is manual. ECS does not delete the Cloud Map namespace when you delete a cluster. The namespace (and its resource registrations) stays behind, and you pay $0.10/resource/month for every orphaned registration. Add Cloud Map namespace cleanup to your cluster teardown runbook.

What Service Connect doesn't support at all: Windows containers, standalone task invocations (only ECS services — not RunTask), and cross-Region routing. Services in different AWS Regions cannot communicate via Service Connect. Use an ALB or API Gateway with a custom domain for cross-Region traffic.

Mixed enrollment. You can run some services on Service Connect and some on Cloud Map in the same cluster. The constraint is that non-enrolled services cannot resolve Service Connect short names. Migrate callers and callees together in the same deployment window, or keep a Cloud Map registration running in parallel during the cutover.

What you actually get — retries, passive health checks, metrics

Service Connect configures the proxy for 2 retries, passive outlier detection (eject after 5 failures in 30s), a 15s default timeout, and per-call CloudWatch metrics — and these settings are fixed, not configurable per service.

Retries. The proxy automatically retries failed requests twice, routing each retry to a different task (not re-sending to the failing host). This is transparent to the calling app — it sends one request, the proxy handles the retry if the first task fails to respond. Retry count is 2; you cannot configure it higher or lower.

Passive outlier detection. The proxy tracks failure rates per task. After 5 consecutive failures in a 30-second window, the task is ejected from the load-balancing pool for 30–300 seconds depending on how many consecutive ejections have occurred. This is "passive" — no active health check probes — and it fires within seconds of detecting a problem.

Timeout. The default per-request timeout is 15 seconds. This value is fixed — AWS manages it. If your service has requests that legitimately run longer (batch processing, report generation), Service Connect's fixed timeout will cause failures. Those services should either use Cloud Map or route through an ALB with a custom timeout.

CloudWatch metrics. Without any instrumentation, you get per-service-pair: RequestCount, HTTPCode_Target_4XX, HTTPCode_Target_5XX, and TargetResponseTime (p50, p99). With appProtocol = tcp, the proxy is active but emits only byte-level metrics — no per-call telemetry.

Not a full service mesh. Retry count, timeout, and outlier detection parameters are AWS-managed fixed values. Unlike raw Envoy or the now-deprecated App Mesh, you cannot configure per-route retry policies or custom circuit-breaker thresholds. App Mesh (the configurable option) reaches end-of-life on September 30, 2026. If you need per-route circuit breaker configuration, the current path is a self-managed Envoy deployment — there's no AWS-native equivalent for ECS once App Mesh retires.

Should you use it? A decision framework by fleet size

Under 5 microservices or on the CodeDeploy controller: use Cloud Map. 10+ services on rolling or native blue/green: Service Connect. Need external routing or L7 features: ALB, often alongside Service Connect.

Walk through these questions in order:

If your situation is…Use this
On CODE_DEPLOY deployment controllerCloud Map — migrate controller first if you want SC
Non-ECS callers (Lambda, EC2) need to reach the serviceInternal ALB (SC can't help here)
Fewer than 5 ECS services calling each otherCloud Map — simpler, cheaper
10+ ECS services, rolling or native blue/greenService Connect — right default
Need mTLS between servicesService Connect + AWS Private CA
Heavy gRPC, FargateService Connect with +128 MiB sidecar memory, pinned agent version
Tasks need >15s per requestCloud Map or ALB — SC has a fixed 15s timeout
Windows containersCloud Map — SC doesn't support Windows

"In cases where you don't need traffic insights and the service is a minor supporting service, Cloud Map Service Discovery can be a simpler solution."

AWS re:Post community, verified June 2026

Service Connect is the right default for new ECS-to-ECS traffic on rolling or native blue/green. But "right default" doesn't mean "apply to everything without checking." The three concrete situations where you should stay on Cloud Map: you're on the CodeDeploy controller and don't have time to migrate it, you have non-ECS callers that can't enroll in the namespace, or your tasks are sized at 0.25 vCPU and the doubled CPU cost isn't justified by the built-in retries.

For teams already using Cloud Map — the bar for migration is not "Service Connect is available." The bar is "we have a concrete reason to switch": you want per-call CloudWatch metrics without adding instrumentation, you need built-in retries without client-side retry logic, or you're enabling mTLS and want the native Private CA integration. If none of those apply, an existing Cloud Map setup that works is not worth disrupting.

If you read this, you might also want to know

Can I mix Service Connect and Cloud Map service discovery in the same cluster?

Yes — services in the same cluster can use different discovery mechanisms. The constraint is that a service not enrolled in the Service Connect namespace cannot resolve SC short names. Services on Cloud Map use Route 53 DNS and are reachable by any VPC resource regardless of namespace enrollment. Migrate callers and callees together to avoid resolution failures during cutover.

How do I migrate existing services from Cloud Map to Service Connect?

Add the service_connect_configuration block to your ECS service Terraform resource and redeploy. ECS creates a new namespace or enrolls in an existing one. Remove the service_registries block after the service is healthy on Service Connect. One caveat: Cloud Map and Service Connect cannot be active on the same ECS service simultaneously — the cutover is per-service, so migrate callee first, then callers in the same deploy window.

Does Service Connect work with ECS tasks on EC2 launch type?

Yes. Service Connect works with both Fargate and EC2 launch types. The Envoy sidecar overhead (0.25 vCPU, 64 MiB) applies either way, but on EC2 you're paying for the instance regardless — the marginal sidecar cost is lower than on Fargate where every CPU unit and MiB is billed directly.

What happens to Service Connect if the Cloud Map namespace is deleted?

Service Connect stops working. The Envoy proxy cannot resolve service names without the Cloud Map namespace. Services lose the ability to route to each other via short names. ECS does not auto-recreate the namespace. Treat the Cloud Map namespace as a dependency of your ECS cluster — include it in disaster recovery runbooks and don't delete it without disabling Service Connect first.

Common questions

Worth reading