How Should You Set Up ECS Logging? (awslogs, FireLens, or Neither)
- ·ECS gives you three logging options: awslogs (CloudWatch only), FireLens (any destination, sidecar required), or none.
- ·Default CloudWatch log retention is Never Expire — $0.03/GB/month accumulates silently. Set it to 30 days on every log group.
- ·awslogs is synchronous by default. If CloudWatch is slow, your container blocks. Add mode: non-blocking to every task definition.
- ·Name log groups /ecs/{environment}/{service} — not /ecs/{service}. It's impossible to fix at 30 environments.
- ·Container Insights costs $0.07/metric/month — worth it in prod, skip it for dev environments.
# Find all log groups with no retention policy and set 30 days
aws logs describe-log-groups \
--query 'logGroups[?retentionInDays==`null`].[logGroupName]' \
--output text | \
xargs -I {} aws logs put-retention-policy \
--log-group-name {} \
--retention-in-days 30Run once, then enforce via Terraform going forward (see aws_cloudwatch_log_group below).
The three ECS logging options (and when each breaks)
ECS supports awslogs (CloudWatch only, 3 params), FireLens (any destination, sidecar required), or none. Most teams start with awslogs and hit its limits somewhere between 5 and 10 services.
The three options are not equally suited to every team. The right driver depends on where you want logs to go, whether you can tolerate blocking on delivery, and whether you run Windows containers. Here's the full comparison:
| Driver | Setup | Destination | Backpressure risk | Windows | Extra cost |
|---|---|---|---|---|---|
| awslogs | 3–4 params | CloudWatch only | Yes (blocking) | Yes | $0 extra |
| FireLens | Sidecar + config | CloudWatch + any | No (buffered) | No | ~$0.005/task/mo |
| None | Zero | Nowhere | No | Yes | $0 (blind) |
The key decisions are: (1) do you need to send logs to anything other than CloudWatch? (2) do you run Windows containers? (3) can you tolerate blocking? If you answered no, no, and no — awslogs with non-blocking mode is the right setup. If any of those answers change, FireLens is the path.
"None" is not zero-cost. A service with no logging driver is invisible during incidents. At 10+ environments, the time you spend reconstructing what happened from ALB access logs and CloudTrail is more expensive than the CloudWatch bill would have been.
Setting up awslogs: the 4 parameters you actually need
awslogs requires awslogs-group, awslogs-region, awslogs-stream-prefix. Add awslogs-create-group: true or the task silently fails to start if the log group doesn't exist. The task execution role needs logs:CreateLogStream and logs:PutLogEvents.
Missing IAM permissions causes silent log loss — the task starts, ECS shows it as healthy, but nothing appears in CloudWatch. No error in the task events. You find out during an incident. Below is the correct task definition snippet and the Terraform to create the log group with an enforced retention policy.
# Terraform — ECS task definition with correct awslogs config
resource "aws_cloudwatch_log_group" "service" {
name = "/ecs/${var.environment}/${var.service_name}"
retention_in_days = 30
tags = {
Environment = var.environment
Service = var.service_name
}
}
resource "aws_ecs_task_definition" "service" {
# ... other fields ...
container_definitions = jsonencode([
{
name = var.service_name
image = var.container_image
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.service.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "ecs"
# Non-blocking mode — prevents container blocking on CloudWatch hiccup
"mode" = "non-blocking"
"max-buffer-size" = "25m"
}
}
}
])
}Note the log group naming: /ecs/{environment}/{service} — not /ecs/{service}. This matters at scale, covered in the naming section below.
"logs:CreateLogStream and logs:PutLogEvents permission on the IAM role that you launch your container instances with."
— AWS ECS documentation, verified June 2026
The "Never Expire" retention tax
Every log group created by ECS defaults to "Never Expire" retention. At $0.03/GB/month, a 10-service fleet accumulates $500+/month in pure storage costs by month 12 if you never set a policy.
This is the most common silent cost in ECS fleets. New log groups — created by awslogs-create-group: true, the console, or CloudFormation — all default to Never Expire. AWS sets this default because it's the safe option for them: you can never lose data you didn't intend to delete. For your bill, it means every GB ever ingested stays charged until you explicitly delete the group.
The math compounds fast. A fleet of 10 services producing 5 GB/day per service ingests 1,500 GB/month. After 12 months with no retention policy, you're storing ~18,000 GB. At $0.03/GB/month, that's $540/month just for storage — on top of the $750/month ingestion cost.
We covered the full CloudWatch cost breakdown in more detail in how to control CloudWatch costs on ECS — including log-level filtering and Logs Insights query optimization. For most teams, retention alone cuts the bill by 30–40%.
30 days covers 95% of incident investigations. 90 days covers compliance requirements for most regulated environments. "Never Expire" covers your AWS bill growing in perpetuity.
The awslogs backpressure problem
awslogs is synchronous by default — if CloudWatch is slow or throttling, your container blocks waiting for the log write. Fix: set mode to non-blocking and max-buffer-size to 25m in logConfiguration.
This is the production gotcha nobody documents until they've seen it. Under normal conditions, CloudWatch responds fast enough that you never notice the synchronous behavior. But when CloudWatch is throttling your account, when a region has degraded performance, or when you're logging at high throughput — your application containers pause waiting for log writes to complete. Requests time out. Health checks fail. ECS replaces the task.
AWS confirmed this in their container logging backpressure blog post: "an application can become blocked using the default awslogs driver." The fix is two lines in your task definition — add mode: non-blocking and max-buffer-size: 25m:
{
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/prod/api",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs",
"mode": "non-blocking",
"max-buffer-size": "25m"
}
}
}The trade-off with non-blocking mode: if the buffer fills (25 MB by default), newer logs are dropped rather than blocking the container. This is the correct trade-off for production — a container that drops some logs under extreme pressure is better than one that stops serving requests. If you need guaranteed delivery, FireLens with filesystem buffering is the right answer.
When to switch to FireLens
Switch to FireLens when you need multi-destination routing, log filtering at source, or guaranteed non-blocking delivery. Don't switch if CloudWatch is your only destination — awslogs is simpler and cheaper.
FireLens runs a Fluent Bit sidecar container alongside your application. Your application's stdout goes to the Fluent Bit container (via the awsfirelens log driver), and Fluent Bit routes it to one or more destinations using its configuration file. The sidecar approach adds ~10 MB of memory and a small amount of CPU, but it buys you filesystem buffering — Fluent Bit buffers logs to disk before delivery, so CloudWatch issues never block your application.
Three specific cases where FireLens is the right call:
- —Multi-destination routing: you want CloudWatch for compliance + S3 for long-term storage + Datadog for real-time analysis. awslogs sends to CloudWatch only. FireLens sends to all three simultaneously.
- —Filter before ingestion: you have DEBUG logs that are useful locally but cost money in CloudWatch. Fluent Bit can drop DEBUG-level records before they're ingested — cutting your CloudWatch bill without changing application code.
- —Guaranteed delivery: you're in a regulated environment where dropped logs are a compliance issue. Fluent Bit's filesystem buffer survives CloudWatch throttling and delivery retries automatically.
Two important FireLens constraints: it does not support Windows containers on ECS, and it listens on port 24224 — block inbound traffic on that port in your task's security group or anyone on the same VPC can push logs to your sidecar.
Log group naming at fleet scale
Name log groups /ecs/{environment}/{service} — not /ecs/{service}. Flat naming makes Logs Insights queries unworkable at 10+ environments and makes per-environment retention policies impossible to set.
This is a convention decision that feels unimportant at 3 services and becomes a serious operational problem at 30. With flat naming (/ecs/api, /ecs/worker), you can't query "all prod logs" in a Logs Insights query without listing every group by name. You can't set a shorter retention on dev environments without affecting prod. You can't see cost by environment in the CloudWatch console.
With hierarchical naming (/ecs/prod/api, /ecs/staging/api), Logs Insights can query all prod groups with logGroupNamePrefix /ecs/prod. You can set 7-day retention on all dev groups using describe-log-groups --log-group-name-prefix /ecs/dev. Naming is part of the broader ECS Fargate best practices for fleet-scale operations — the same principle applies to CloudWatch metric namespaces, IAM role names, and ECS cluster names.
# Query all prod logs at once — only works with hierarchical naming
aws logs start-query \
--log-group-name-prefix "/ecs/prod" \
--start-time $(date -d '1 hour ago' +%s) \
--end-time $(date +%s) \
--query-string 'filter @message like /ERROR/ | stats count(*) by @logStream'
# Set 7-day retention on all dev log groups
aws logs describe-log-groups \
--log-group-name-prefix "/ecs/dev" \
--query 'logGroups[].logGroupName' \
--output text | \
xargs -I {} aws logs put-retention-policy \
--log-group-name {} \
--retention-in-days 7What ECS logging actually costs
Ingestion costs $0.50/GB. Storage costs $0.03/GB/month. Logs Insights queries cost $0.12/GB scanned. A 10-service fleet producing 5 GB/day per service pays $750/month in ingestion before touching storage or queries.
The full CloudWatch pricing breakdown (verified June 2026):
| Line item | Unit | Price | Note |
|---|---|---|---|
| Log ingestion | per GB | $0.50 | First 5 GB free/month |
| Log storage | per GB/month | $0.03 | Never Expire = runs forever |
| Logs Insights queries | per GB scanned | $0.12 | Per GB, not per query |
| Container Insights Enhanced | per metric/month | $0.07 | ~30–50 metrics per 10 services |
Running the math on two real scenarios:
The ingestion cost ($0.50/GB) is largely unavoidable if you're sending logs to CloudWatch. The storage cost is 100% avoidable with a retention policy. The Insights cost scales with how much data you scan per query — shorter retention windows mean cheaper queries.
"When a new CloudWatch Log Group is created, its data retention policy is automatically set to 'Never Expire.' While this ensures logs are always available, it also results in unnecessary storage costs over time."
— Towards AWS, verified June 2026
If you read this, you might also want to know
What happens to logs already in CloudWatch if I switch from awslogs to FireLens?
Nothing — existing logs stay in CloudWatch. FireLens only affects where new logs go. The switch requires updating the task definition and redeploying (new task revision). Old log streams stay queryable under the same log groups.
Can I send ECS logs to Datadog without FireLens?
Not directly. The awslogs driver sends to CloudWatch only. You can set up a Lambda function to forward CloudWatch logs to Datadog, but FireLens is the cleaner path — send directly from the container to Datadog without going through CloudWatch at all.
How do I query logs across multiple ECS environments in Logs Insights?
Use the log group name prefix filter in Logs Insights — logGroupNamePrefix /ecs/prod queries all groups under that prefix. This only works with hierarchical naming (/ecs/{environment}/{service}). Flat naming requires selecting each group individually.
Should I use structured (JSON) logging or plaintext in ECS?
JSON. CloudWatch Logs Insights can parse and filter JSON fields natively, which reduces the GB scanned per query. Plaintext requires grep-style filters that scan every byte. The switch to JSON in your application doesn't change ECS or CloudWatch config — it's an application-level change.