AWS Fargate vs Lambda: When Does Lambda Stop Being Cheaper?
Lambda is not categorically cheaper than Fargate, and Fargate is not categorically cheaper than Lambda. There is a crossover point, and it is set mostly by how long each invocation runs — not by how much traffic you get. Most comparisons stop at a feature table. This one gives you the breakeven in dollars, the hidden costs that move it, and what the new Lambda MicroVMs (launched June 22, 2026) change — and what they don't.
- ·Lambda wins on short, spiky, event-driven work. Fargate wins on long-running, steady services. The line is duration × frequency, not raw traffic.
- ·Real breakeven: a 200ms API crosses ~6–8M invocations/mo; a 2s background job crosses ~1M/mo; at 5s+ duration Lambda almost never wins at scale.
- ·Per-request charges are only ~20–40% of a serverless bill. API Gateway ($3.50/M), CloudWatch Logs ($50–150/mo), NAT, and provisioned concurrency are the rest.
- ·June 22, 2026: Lambda MicroVMs lifted the runtime limit from 15 min to 8 hours (16 vCPU / 32 GB, Firecracker) — but they target isolated sandboxes for AI and untrusted code, not always-on web services.
Lambda is cheaper for short, spiky workloads; Fargate is cheaper for long-running, steady ones. The crossover is set by execution duration: a 200ms API endpoint stays cheaper on Lambda up to roughly 6–8M invocations/month (including API Gateway and CloudWatch), while a 2s background job crosses to Fargate at about 1M/month. At 5s+ average duration, Lambda almost never wins at scale. Fargate compute runs $0.04048/vCPU-hr + $0.004445/GB-hr; Fargate Spot is ~68% cheaper. A monthly Lambda bill above ~$1,000 is a strong signal that at least one workload belongs on Fargate.
The two cost formulas, side by side. Swap your own numbers in and you get the monthly figure for each service — then compare. Lambda bills GB-seconds plus a per-request fee; Fargate bills allocated vCPU and memory for the hours the task runs.
# ---- Lambda monthly cost ----
# memory_gb = function memory / 1024 e.g. 0.5
# duration_s = avg execution seconds e.g. 0.2
# invocations = requests per month e.g. 5_000_000
#
gb_seconds = memory_gb * duration_s * invocations
compute = gb_seconds * 0.00001667 # $/GB-second
requests = invocations * 0.00000020 # $0.20 per 1M requests
api_gateway = invocations * 0.0000035 # $3.50 per 1M (if used)
lambda_total = compute + requests + api_gateway
# ---- Fargate monthly cost (one always-on task) ----
# vcpu = 0.5 mem_gb = 1 hours = 730 (24/7) or ~217 (business hrs)
fargate_total = (vcpu * 0.04048 + mem_gb * 0.004445) * hours
# Fargate Spot: multiply the compute rates by ~0.319 (≈68% off)
# ---- Worked example: 0.5 GB, 0.2s, 5M invocations ----
# Lambda : 500,000 GB-s -> $8.34 compute + $1.00 requests + $17.50 API GW = ~$26.84/mo
# Fargate : (0.5*0.04048 + 1*0.004445) * 730 = ~$18.02/mo (24/7)
# At 5M invocations of a 200ms API, one always-on Fargate task already wins.Rates: Lambda $0.00001667/GB-s + $0.20/1M requests; API Gateway $3.50/1M; Fargate $0.04048/vCPU-hr + $0.004445/GB-hr (Linux/x86, us-east-1, verified June 2026). Real systems need more than one Fargate task for redundancy — adjust hours and task count for your setup.
Where Lambda wins, where Fargate wins
Lambda bills per millisecond of execution and fits spiky, event-driven work; Fargate bills for allocated vCPU and memory and wins on long-running, steady services. The split is duration, not app type.
The framing “serverless is cheaper” hides the mechanism underneath. Lambda charges for the time your code runs, rounded to the millisecond, times the memory you assigned. When code runs rarely and briefly, you pay almost nothing between invocations. When it runs constantly, you pay for each of those milliseconds — and there are 2.6 billion of them in a month.
Fargate is the inverse. You pay for a task's vCPU and memory for as long as it exists, whether it serves one request or ten thousand per second. Idle time is wasted money; saturated time is a bargain.
- · S3-triggered file processing
- · Webhook and HTTP handlers with bursty traffic
- · Scheduled (cron) jobs that run briefly
- · Queue and stream consumers with variable load
- · Anything that needs to scale from zero instantly
- · Long-lived microservices and APIs
- · Services holding connections (WebSocket, gRPC)
- · Batch and ETL jobs past the 15-minute mark
- · Steady traffic where utilization stays high
- · Workloads needing precise CPU/memory control
A “serverless” service that runs 24/7 under steady load is paying Lambda's premium for elasticity it never uses. Elasticity is only free when your traffic is spiky. If your invocation graph is a flat line, you are buying the wrong abstraction.
The real cost breakeven
Breakeven is set by function duration: a 200ms API crosses around 6–8M invocations a month, a 2s background job around 1M. The longer the function runs, the sooner Fargate wins.
Invocation count is the number most teams reach for, but it is the wrong axis on its own. Duration multiplies it. A 200ms function and a 2s function at the same invocation count have a 10× difference in GB-seconds — and GB-seconds are what Lambda bills. That is why the breakeven for a longer function lands at a fraction of the invocations.
| Workload | Lambda config | Compute only | + API GW & logs |
|---|---|---|---|
| API endpoint | 512 MB · 200 ms | ~10M / mo | ~6–8M / mo |
| Background processor | 1024 MB · 2 s | ~1.5M / mo | ~1M / mo |
| Data pipeline | 2048 MB · 500 ms | ~5M / mo | ~4M / mo |
Put a concrete profile through it. At 5 million invocations of a 200ms function with 512 MB, Lambda's compute plus an API Gateway in front already exceeds the cost of a single always-on Fargate task doing the same work. The Fargate bar is stacked: its own compute, plus the slice of the shared NAT Gateway it actually uses.
That comparison uses one Fargate task for clarity. Production needs at least two for redundancy, plus the rest of the fixed overhead an ECS environment carries — what a real Fargate environment costs once you count the ALB and CloudWatch alongside that NAT share. Fold it in and the crossover shifts, but the direction holds: the longer and busier the workload, the more Fargate pulls ahead.
These thresholds come from a 2026 third-party analysis of production workloads, not from AWS. Treat them as a starting estimate and confirm with your own numbers using the formulas above — your memory size, duration, and whether you front Lambda with API Gateway all move the line.
The hidden costs that move the line
Per-request charges are only 20–40% of a serverless bill. API Gateway ($3.50/M, often more than the Lambda itself), CloudWatch Logs ($50–150/mo at 1M+/day), NAT, and provisioned concurrency make up the rest.
The Lambda line item on your bill is the part most teams model. The rest hides in adjacent services that the function can't run without. A fair comparison has to count them, because Fargate either avoids them or pays them differently.
Most HTTP Lambdas sit behind API Gateway. At high request volume, that per-request fee routinely exceeds the Lambda compute cost itself. A Fargate service behind an Application Load Balancer pays a flat ~$22/month instead, regardless of request count.
Every invocation writes a log stream. At a million-plus invocations a day, ingestion alone runs $50–150/month. Both platforms log, but Lambda's per-invocation granularity multiplies the line count fast.
The standard fix for cold starts keeps warm instances on standby and charges for them around the clock — the always-on cost model you chose Lambda to avoid.
Lambda scales CPU with the memory setting. A CPU-bound function forces you to over-provision memory you don't need to get more cores. Fargate lets you set vCPU and memory independently.
If your monthly Lambda-related bill clears ~$1,000, moving the heaviest function group to Fargate is likely your highest-ROI infrastructure task this quarter. The savings rarely come from the function line alone — they come from dropping the API Gateway and CloudWatch surcharges that ride along with it.
The twist: Lambda MicroVMs move the boundary (June 2026)
On June 22, 2026 AWS shipped Lambda MicroVMs: up to 8 hours, 16 vCPU, 32 GB, Firecracker isolation. It removes the 15-minute limit — but it targets isolated sandboxes for AI and untrusted code, not always-on web services.
For years, the 15-minute timeout was the cleanest reason to leave Lambda: if a job ran longer, you moved it to Fargate or Batch. MicroVMs change that specific fact. Each session runs in its own dedicated MicroVM — Firecracker virtualization, no shared kernel, no shared resources with other sessions — and can hold state across user interactions for up to eight hours.
“Each session runs in its own dedicated MicroVM with no shared kernel and no shared resources between users, so untrusted code supplied by one user is contained to their execution environment.”
— AWS News Blog: Lambda MicroVMs, June 2026
The intended use cases tell you who this is for: AI coding assistants, interactive code environments, data analytics platforms, vulnerability scanners, and game servers that run user-supplied scripts. The common thread is running code you don't trust in a hard isolation boundary, with full lifecycle control over each session.
The part most takes will get wrong: MicroVMs are not Lambda becoming Fargate. A MicroVM is a lifecycle-managed session you launch, use, and tear down — not an always-on listener answering a steady stream of HTTP requests. For a long-lived web service or API, the right tool is still Fargate. What MicroVMs displace is the pattern where teams spun up a Fargate task as a sandbox to run untrusted or AI-generated code — that niche now has a purpose-built home.
If your isolation need is bigger than a single sandbox — a full copy of a service with its dependencies — that is still a container problem, closer to cloning a full environment instead of a single sandbox than to a MicroVM session.
On cost: AWS prices MicroVMs across three dimensions — compute (per-second, on baseline and peak usage), snapshot operations and storage, and data transfer. AWS has not published a flat per-second rate, so there is no clean number to drop into the breakeven math yet. MicroVMs is available in US East (N. Virginia and Ohio), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland).
Cold start and latency reality
Lambda cold starts run from tens to hundreds of milliseconds, worse inside a VPC or with large packages. Provisioned concurrency removes them — but then you pay around the clock, which erodes Lambda's main cost edge.
A cold start is the time Lambda needs to spin up a new execution environment when no warm one is available. For a small function it's tens of milliseconds; in a VPC, or with a heavy runtime and large deployment package, it climbs into the hundreds. For a user-facing API that occasionally idles, that tail latency is the complaint that shows up in your dashboards.
Fargate has its own startup cost — 30 to 90 seconds to launch a task — but it pays that once. After that the task stays warm and per-request latency is steady, because there is no per-invocation environment to create.
The catch is what fixing Lambda's cold start does to the bill. Provisioned concurrency keeps instances warm and charges for them whether or not a request arrives. That is the always-on cost model — the one Lambda was supposed to let you skip. Once you're paying for warm capacity full time, you've recreated Fargate's economics without Fargate's resource control.
The decision: a checklist
Pick Lambda for spiky or unpredictable traffic, short tasks under ~1s, and low volume. Pick Fargate for steady load, functions 5s+, fine CPU/memory control, or a Lambda bill above ~$300–1,000/mo.
| Dimension | AWS Lambda | AWS Fargate |
|---|---|---|
| Billing unit | Per ms of execution (GB-seconds) | Per second of allocated vCPU + GB |
| Best traffic shape | Spiky, unpredictable, event-driven | Steady, sustained, always-on |
| Short tasks (<1s) | Wins on cost | Overkill |
| Long tasks (5s+) | Loses fast at scale | Wins on cost |
| Runtime ceiling | 15 min (8 hr via MicroVMs) | Unbounded |
| Resource control | CPU tied to memory | vCPU + memory set independently |
| Cold start | Tens–hundreds of ms | 30–90s once, then steady |
| Scale to zero | Native, instant | Manual (scheduling / desiredCount 0) |
Reduced to a few if/then rules:
Most teams don't pick one and stop. A typical setup keeps spiky glue on Lambda, runs the steady services on Fargate, and now has MicroVMs for the sandbox case. The mistake isn't mixing them — it's leaving a workload on the wrong one after its traffic shape has changed.
If you read this, you might also want to know
What if my workload is partly spiky and partly steady?
Split it. Run the steady core — the part that handles baseline traffic 24/7 — on Fargate, and keep the spiky overflow or event-driven glue on Lambda. The mistake is forcing one runtime onto a workload with two traffic shapes. A Fargate service for the baseline plus Lambda for bursts is usually cheaper than either alone scaled to cover both.
Where does EC2 fit between Lambda and Fargate on cost?
EC2 sits below Fargate on raw compute price at high, steady utilization, because Reserved Instances and Savings Plans cut 30–50% — but you take on AMI patching, scaling, and capacity planning. The order at steady state is roughly EC2 < Fargate < Lambda on cost, and roughly Lambda < Fargate < EC2 on operational burden. Fargate is the middle: more expensive than tuned EC2, far less to operate.
Are MicroVMs cheaper than running a Fargate task as a sandbox?
There's no clean answer yet — AWS prices MicroVMs on per-second compute plus snapshot storage and data transfer, with no published flat rate. The likely advantage is operational, not only dollar: MicroVMs give per-session isolation and lifecycle control out of the box, where a Fargate sandbox makes you build task launch, teardown, and isolation yourself. Compare on total effort, not the compute line alone.
Common questions
Running a fleet of always-on Fargate environments?
Once you've chosen Fargate, the next bill comes from idle environments running 24/7. Fortem schedules and right-sizes them so you stop paying for compute no one's using. 20 minutes, no Terraform changes.
Response within 4 hours, weekdays.