Guide
Matt S
Matt S
Platform engineer at Fortem··8 min read

What Does DevOps Automation Miss Beyond CI/CD?

Your CI/CD pipeline builds, tests, and deploys. Your on-call rotation handles incidents. What happens between deploy and incident? Who keeps track of the environments? Who keeps the AWS bill from spiraling? Who gives developers the power to do their jobs without pinging the platform team? This is the ops automation gap — and every team with 10+ environments eventually discovers it.

TL;DR
  • ·CI/CD automates deployment — not operations. Between deploy and incident, there's a gap no pipeline touches.
  • ·Five things every team eventually discovers they need: scheduling, self-service, cost tracking, cloning, and orphan detection.
  • ·A 2-person platform team spends 30–50% of their week on these five gaps — Slack messages, Excel sheets, one-off Terraform modules.
  • ·Building all five from scratch takes 16–40 weeks. Buying a platform fills them in one onboarding.

CI/CD is deployment automation. It's not operations.

CI/CD covers the deploy pipeline: build, test, deploy, rollback. When the pipeline is green, a new version is running. When it's red, on-call gets paged. This model works — every modern team has it.

What it doesn't cover: environment scheduling, developer self-service, cost tracking, cloning, waste detection. These aren't bugs in CI/CD. They're operations problems that live in a different category. One is a pipeline. The other is a control plane.

CI/CD PipelineDeploy?Operations
The ops automation gap
Key insight

CI/CD automated the deploy button. Nobody automated what happens after — the day-to-day of managing environments at scale. The deploy pipeline is solved. Operations is still manual.

Gap 1: Environment scheduling

There are 168 hours in a week. Your team works 40–50 of them. Your dev and staging environments run all 168 — billing by the second through nights, weekends, and holidays. Nobody needs a dev environment at 3am on Sunday.

The DIY build:Lambda functions + EventBridge rules per environment. Each needs a separate cron, a per-timezone configuration, and an override mechanism for ad-hoc work. At 30 environments, that's 30 Lambda functions — each one a deployment artifact to maintain, monitor, and debug when it silently stops working.

Platform team cost: 2–4 weeks to build, ongoing maintenance as environments are added. The tools exist — EventBridge and Lambda are free or near-free — but the integration and maintenance burden scales linearly with fleet size.

Gap 2: Developer self-service

“Can you restart staging?” — the Slack message every platform engineer receives at least twice a week. Developers can deploy to production via CI. They can't restart staging without you. The AWS Console is too dangerous. The IAM policy for “just restart this one service” is too granular to hand out.

The DIY build: a web UI with RBAC per environment, backed by AWS API Gateway and scoped IAM policies. You need role-per-env mappings, a way to audit who did what, and a mechanism to revoke access. The IAM part alone — managing policies per service per environment — is the reason most teams give up and hand out broad AWS Console access.

Platform team cost: 4–8 weeks to build a minimum viable self-service portal. 2–5 hours per week spent on restart and status requests that the portal would handle.

Gap 3: Fleet visibility and cost tracking

AWS Cost Explorer shows aggregate spend. It doesn't show per-environment cost. You can't answer “how much does staging cost vs dev this month?” without a spreadsheet and 24 hours of tag propagation delay.

The DIY build:tag every resource with an Environment key. Activate cost allocation tags in Billing. Wait 24 hours for tags to propagate. Export a CSV. Build a spreadsheet. Repeat monthly. This works — for a while. As the fleet grows, the spreadsheet gets abandoned, and the CTO asks “why is AWS up 30% this quarter?” and nobody can answer without a day of Cost Explorer archaeology.

Key insight

By the time you see the numbers in the spreadsheet, the money is spent. Real-time per-environment cost visibility — not aggregate billing data — is what lets platform teams react before the quarter ends.

Gap 4: Environment cloning

“QA needs an isolated copy of staging to test the compliance flow.” This request lands in the platform team's Slack every few weeks. The environment has 18 services, 4 databases, networking rules, and a dozen environment variables. Building a copy means: write a new Terraform module, override variables, pray you didn't miss a dependency.

The DIY build: Terraform module that parameterizes every service, database, and network rule from a source environment. This is a 300–400 line module that needs to stay in sync with the source. Every time the source changes, the clone module drifts.

Platform team cost: 4–8 hours per clone request. At 3 clones per month, that's 12–24 hours of engineering time. Plus the cost of the days QA waits for the environment to be ready.

Gap 5: Orphaned environment detection

Every team has them. An environment was spun up for a demo six months ago. A PR preview for a feature branch that was merged and abandoned. A hackathon project that shipped and was forgotten. Nobody deploys to these environments. Nobody owns them. They just bill — quietly, every month.

The DIY build: pull the last deployment timestamp per environment. Cross-reference with the team directory. Environments with no deploy in 30+ days and no active owner go on a review list. The platform team reviews, confirms abandonment, and deletes the infrastructure.

Key insight

In a fleet of 30+ environments, most teams find 2–5 orphaned environments when they look seriously. At $200–400/month each, that's $400–$2,000/month — or $4,800–$24,000/year — for compute serving zero requests. A one-time audit catches it. Without it, the environments keep billing forever.

Building vs buying the ops layer

Each of these five gaps is solvable with AWS-native tools — Lambda, EventBridge, IAM, Cost Explorer, Terraform. The problem isn't the tools. It's the cumulative build and maintenance cost across all five.

GapDIY weeksFortem
Scheduling2–4Built-in, per-timezone
Self-service4–8RBAC by environment
Cost tracking2–3Live per-env cost
Cloning3–6Clone in a few clicks
Orphans1–2Last deploy + owner visible
16–40 weeks· $90–220k in labor

Platform engineer at $180–220k/yr loaded (~$3,500–4,200/week, Glassdoor + Levels.fyi, 2026). Maintenance adds 20–30% annually as the fleet grows.

A fair objection: with AI coding tools — Claude, Copilot, Codex — the build time drops. A skilled engineer using current LLMs could ship these five tools in 6–16 weeks, not 16–40. That changes the cost equation. What it doesn't change: the maintenance burden, the integration surface (IAM, EventBridge, Cost Explorer APIs), and the fact that you're building internal tools while your competitors ship product. The gap isn't just about time. It's about focus.

Key insight

A platform team spending 30–50% of their week on these five gaps isn't building product. They're maintaining internal tools — the same tools every team builds, differently, from scratch. CI/CD automated the deploy button. Ops needs the same treatment.

Common questions

We automated the ops layer.

Fortem fills all five gaps — scheduling, self-service, cost tracking, cloning, and orphan detection — in a 7-day onboarding. No Terraform changes. No internal tool maintenance.

Response within 4 hours, weekdays.

Continue reading