AWS ECR: How Container Registry Works for ECS Fargate Teams
Every ECS Fargate deploy pulls an image from ECR — and ECR is the part nobody owns until it breaks. A task in a private subnet throws ResourceInitializationError, or five years of untagged images quietly push the bill to $400/month. This is ECR from the ECS operator's seat: how pulls actually work, the IAM the execution role needs, what it costs at fleet scale, and the lifecycle, scanning, and replication settings that matter at 10+ environments — with the AWS-verified pricing nobody else itemizes.
- ·ECR is AWS's managed container registry — the default image store for ECS and EKS. Registry → repository → image, with IAM-based access and a short-lived auth token per pull.
- ·The #1 ECR failure on Fargate is a private-subnet task that can't pull: it needs either a NAT gateway or three ECR VPC endpoints, plus AmazonECSTaskExecutionRolePolicy on the execution role.
- ·ECR storage is $0.10/GB-month; same-region pulls to Fargate are free. The hidden bill is old images — one team went from $400/mo to ~$15/mo with a 30-day lifecycle policy.
- ·At fleet scale three settings matter: lifecycle policies (cost), scan-on-push (security), and cross-account replication (multi-account image distribution).
- ·For ECR-heavy fleets in private subnets, VPC interface endpoints are often cheaper than routing every pull through a NAT gateway.
Push an image, then a lifecycle policy that keeps the bill flat, then the exact networking + IAM a private-subnet Fargate task needs to pull:
# 1. Authenticate Docker to your private ECR registry, then push
aws ecr get-login-password --region us-east-1 \
| docker login --username AWS --password-stdin \
123456789012.dkr.ecr.us-east-1.amazonaws.com
docker tag my-app:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest// 2. Lifecycle policy — kill the hidden storage bill.
// Rule 1: drop untagged layers after 1 day. Rule 2: drop non-prod after 30 days.
{
"rules": [
{
"rulePriority": 1,
"description": "Expire untagged images after 1 day",
"selection": {
"tagStatus": "untagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 1
},
"action": { "type": "expire" }
},
{
"rulePriority": 2,
"description": "Expire non-prod images after 30 days",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["dev", "staging", "pr-"],
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 30
},
"action": { "type": "expire" }
}
]
}# 3. The exact set a PRIVATE-subnet Fargate task needs to pull from ECR.
# Two interface endpoints (ecr.api, ecr.dkr) + an S3 gateway endpoint
# (ECR stores image layers in S3). No NAT gateway required.
resource "aws_vpc_endpoint" "ecr_api" {
vpc_id = var.vpc_id
service_name = "com.amazonaws.${var.region}.ecr.api"
vpc_endpoint_type = "Interface"
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.endpoints.id] # allow TCP 443 from tasks
private_dns_enabled = true
}
resource "aws_vpc_endpoint" "ecr_dkr" {
vpc_id = var.vpc_id
service_name = "com.amazonaws.${var.region}.ecr.dkr"
vpc_endpoint_type = "Interface"
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.endpoints.id]
private_dns_enabled = true
}
resource "aws_vpc_endpoint" "s3" {
vpc_id = var.vpc_id
service_name = "com.amazonaws.${var.region}.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = var.private_route_table_ids
}
# The task EXECUTION role needs the ECR pull actions (AmazonECSTaskExecutionRolePolicy covers these):
# ecr:GetAuthorizationToken, ecr:BatchGetImage, ecr:GetDownloadUrlForLayerWhat ECR actually is (for an ECS team)
ECR is AWS's managed container registry — the default image store for ECS and EKS. ECS pulls images at task launch using the execution role's IAM and a short-lived auth token.
AWS calls ECR "an extension of both services" — meaning if you run ECS Fargate, you already run ECR whether you think about it or not. Every task definition references an image, and that image almost always lives in a private ECR repository in your account. ECR is the boring dependency in the deploy path: invisible when it works, a production incident when it doesn't.
The generic "what is a container registry" explanation — it's a managed Docker registry, you push and pull over HTTPS, IAM controls access — is true and covered everywhere. The rest of this guide is the part that isn't: how ECR behaves from the seat of someone operating an ECS Fargate fleet, where the failures and the costs actually live.
Registry vs repository vs image — the model
One registry per account per region holds many repositories; each repository holds the tagged versions of one image, addressed asaccount.dkr.ecr.region.amazonaws.com/repo:tag.
The three nouns trip people up because AWS reuses "registry." Your registry is the per-account, per-region namespace. A repository holds one logical image — my-app — with all its tags and versions. An image is one immutable build, addressed by tag (:latest, :v2.3.1) or by digest.
Repositories are private by default. Public repos exist (ECR Public, for distributing images to anyone), but an ECS fleet pulls from private repos in its own account — which is why every pull needs both a network path and IAM, the two things the next sections fix.
How an ECS Fargate task pulls an image
At launch, the Fargate agent uses the task EXECUTION role to pull the image before your container starts. If that pull can't reach ECR or lacks IAM, the task dies with ResourceInitializationError.
The single most useful distinction in ECS: the execution role and the task role are different identities. The execution role is what the Fargate agent uses to set the task up — pull the image from ECR, fetch secrets, write logs — all before your code runs. The task role is what your application uses at runtime to reach S3, DynamoDB, and other services. ECR pulls are an execution-role concern; putting ECR permissions on the task role is a common dead end.
The pull is the very first thing that happens. That's why an ECR problem shows up as a task that never starts, not an app error — and why it's worth knowing the execution role's exact permissions, which sit alongside every task-definition field including the execution role.
Why private-subnet tasks fail to pull (the #1 ECR error)
A private-subnet Fargate task has no path to ECR by default. It needs a NAT gateway or three VPC endpoints (ecr.api, ecr.dkr, S3 gateway) — missing them is the #1 ResourceInitializationError cause.
This is the failure that fills AWS re:Post: a task in a public subnet pulls fine, you move it to a private subnet for security, and deploys start dying with ResourceInitializationError: unable to pull image. The image didn't change; the network path disappeared. ECR lives on the public AWS network, and a private subnet has no route to it without help.
Two ways to give it a path. A NAT gatewayroutes the task's outbound traffic to the internet, where it reaches ECR — simple, one resource. Or three VPC endpoints: an interface endpoint for ecr.api (the API), an interface endpoint for ecr.dkr (the Docker registry), and a gatewayendpoint for S3 — because ECR stores the actual image layers in S3, and the pull fails silently if the task can't reach S3 too. The endpoint security group must allow TCP 443 from the task's security group.
The forgotten third endpoint is S3. Teams add the two ECR interface endpoints, see pulls still fail, and assume ECR is broken. ECR hands the agent a pre-signed S3 URL for the layers — no S3 endpoint, no layers, ResourceInitializationError. All three or none.
The execution-role IAM ECR needs
The task execution role needs AmazonECSTaskExecutionRolePolicy — or the equivalent three ecr: pull actions. Without it the pull is denied even with correct networking.
Three actions do the whole pull. ecr:GetAuthorizationToken gets the short-lived token Docker uses to log in. ecr:BatchGetImage fetches the image manifest. ecr:GetDownloadUrlForLayergets the pre-signed S3 URLs for each layer. AWS's managed AmazonECSTaskExecutionRolePolicy bundles all three plus the CloudWatch Logs permissions a task needs — attach it to the execution role and the IAM half is done.
The diagnostic rule of thumb: if the pull fails the same way from everysubnet, it's IAM; if it fails only from private subnets, it's networking. The two failures look identical in the task event log, so check both — most wasted hours come from fixing the wrong half.
NAT gateway vs VPC endpoints — the cost decision
A NAT gateway is simplest but bills $0.045/hr plus $0.045/GB. For a fleet pulling constantly across many private subnets, three ECR VPC endpoints are often cheaper and keep pulls private.
The naive choice is a NAT gateway — it's one resource and it fixes the pull. But a NAT gateway is a per-environment fixed cost (roughly $32/month each before data), and image pulls are data-heavy: every task launch drags layers through it at $0.045/GB. A fleet that scales up and down all day, pulling on every launch, can run a surprising NAT data bill that's really just ECR traffic.
VPC interface endpoints have their own hourly cost, but pull traffic over them stays on the AWS network and avoids the NAT per-GB charge. For an ECR-heavy fleet — many environments, frequent launches — the endpoints usually win, and they remove image pulls as a reason your tasks ever touch the internet. This is the same fixed-vs-usage tradeoff that runs through the real per-environment cost of Fargate including NAT.
What ECR actually costs
ECR storage is $0.10/GB-month; same-region pulls to Fargate are free; cross-region transfer is $0.09/GB. New accounts get 500 MB free for a year. The real bill is accumulated old images.
Verified against the AWS ECR pricing page(July 2026). Note what's NOT here: pulling to your Fargate tasks in the same region costs nothing. So the bill isn't your deploys — it's storage that only ever grows. Which is the next section.
The hidden ECR bill — old images, and the fix
Untagged and stale images pile up invisibly. One team paid $400/month for five years of old images; a lifecycle policy (untagged after 1 day, non-prod after 30 days) dropped it to ~$15/month.
Every CI run pushes a new image. Every push of :latestorphans the previous one as an untagged layer. None of it is ever deleted unless you say so, and nobody opens the ECR console to look. So storage compounds quietly — a few GB a month becomes hundreds of GB over a few years, and the only signal is a bill line that's slowly climbed.
Lifecycle policies fix it for free. You write rules — by tag, age, or count — and ECR deletes the rest automatically. The pair in the ready-to-use block above covers most teams: expire untagged images after 1 day (the orphaned :latest layers nobody references), and expire dev/staging/PR images after 30 days (so a paused project stops billing). That single pair is the whole $400-to-$15 swing in the documented caseabove. Test rules in the console's dry-run preview before applying — a too-aggressive prod rule that deletes an image a service still references is its own incident.
Image scanning — catching vulnerabilities on push
ECR scan-on-push checks each new image for known CVEs as a per-repository setting. It's the cheapest first line of container vulnerability detection — one toggle per repo, free on the basic tier.
Turn on scan-on-push and every image gets checked against a CVE database the moment it lands, with results you can read in the console or via the API. Basic scanning is free and historically used the open-source Clair CVE database (AWS now describes it as native technology over the same CVE data); enhanced scanning (powered by Amazon Inspector) goes deeper into OS and language packages and bills per image. For a team heading into a SOC 2 audit, scan-on-push is the cheapest box to tick — it turns "do you scan container images" from a project into a per-repo toggle.
Distributing images across accounts
Multi-account ECS fleets need the same image in every account. Two clean paths: ECR cross-account replication (push once, pull locally) or a shared repository policy scoped to named accounts.
Once prod, staging, and dev live in separate AWS accounts, one image built in a CI account has to reach all of them. Replication copies the image into a local repo in each destination account, so every pull is in-account and in-region (and free). A shared repository keeps one copy and grants pull access via a repository policy naming the specific accounts. The shortcut to avoid is granting access to your whole Organization with aws:PrincipalOrgID — it works, but it opens the repo to every account in the Org, which is a finding an auditor circles in red. The full cross-account image-distribution mechanics live in the multi-account operating model.
Lifecycle, immutability, pull-through cache — fleet settings
Three registry settings matter at scale: lifecycle policies (cost), tag immutability (no silent :latest overwrites), and pull-through cache (mirror public images to dodge rate limits).
Tag immutability stops a second push of:v1.2.0 from silently replacing the first — so a tag always points at the exact bytes you deployed, which matters for rollbacks and audits. Pull-through cache mirrors an upstream public registry (Docker Hub, the ECR Public gallery) into your private registry on first pull, then keeps it fresh — it dodges Docker Hub rate limits that randomly fail deploys and keeps base-image pulls in-account. And lifecycle policies, from the cost section, are the third — the one setting whose absence shows up on the bill.
If you read this, you might also want to know
Do I need a NAT gateway if I use ECR VPC endpoints?
Not for ECR pulls — the ecr.api + ecr.dkr + S3 endpoints give a private-subnet task everything it needs to pull. You still need a NAT gateway (or other endpoints) if the task itself reaches other internet services at runtime. Many teams drop NAT to endpoints-only and cut both cost and internet exposure.
What's the difference between the task role and the execution role for ECR?
The execution role is used by the Fargate agent to set the task up — pull the image from ECR, fetch secrets, write logs — before your container runs. The task role is used by your application code at runtime. ECR pulls are always an execution-role permission; putting ecr:* on the task role does nothing for the pull.
Does scan-on-push cost extra?
Basic scanning (Clair-based, OS packages) is free. Enhanced scanning via Amazon Inspector — which adds OS and programming-language package CVEs and continuous re-scanning — bills per image scanned. Most teams start with free basic scanning and upgrade specific repos to enhanced when compliance requires it.
Can two AWS accounts share one ECR repository?
Yes — attach a repository policy that grants the ECR pull actions to the specific account IDs that need it, and they pull cross-account. The cleaner pattern at scale is cross-account replication (each account pulls from its own local copy, free and in-region). Avoid granting access to the whole Organization via aws:PrincipalOrgID.
FAQ
Every environment's images, cost,
and ECR + NAT spend. One screen.
Fortem maps every ECS environment across accounts — which image each pulls, what your ECR storage and NAT data add up to per environment, and where old images are quietly billing. Book a 20-minute walkthrough.