What is automation in DevOps?

Beyond CI/CD pipelines, DevOps automation covers environment scheduling, self-service operations, cost tracking, environment cloning, and orphaned resource detection. CI/CD automates deployment; operational automation manages what happens after deployment — the 95% of a service's lifecycle that CI/CD doesn't touch.

Is DevOps dead due to AI?

No, but it's evolving. AI automates diagnosis and routine fixes (e.g., reading CloudWatch logs to identify why a task crashed). The role shifts from manually debugging to reviewing AI proposals and building the platform that makes automation possible. Platform engineering is the evolution of DevOps — not its replacement.

What are the 7 C's of DevOps?

Continuous planning, development, integration, testing, deployment, monitoring, and operations. Most teams automate the first 5 (CI/CD). The last two — monitoring and ongoing operations — is where the real gap is: scheduling, self-service, cost visibility, and fleet management at scale.

Guide

Matt S

Platform engineer at Fortem·June 2, 2026·8 min read

devops-automationplatform-engineering-automationbeyond-cicd-devops

What Does DevOps Automation Miss Beyond CI/CD?

Q: What does DevOps automation miss beyond CI/CD?

Five gaps: (1) environment scheduling — dev/staging run 168 hrs/week, used 40; (2) developer self-service — restarting staging requires a ticket; (3) cost tracking per environment; (4) environment cloning without manual steps; (5) finding and killing orphaned environments nobody uses.

Your CI/CD pipeline builds, tests, and deploys. Your on-call rotation handles incidents. What happens between deploy and incident? Who keeps track of the environments? Who keeps the AWS bill from spiraling? Who gives developers the power to do their jobs without pinging the platform team? This is the ops automation gap — and every team with 10+ environments eventually discovers it.

TL;DR

·CI/CD automates deployment — not operations. Between deploy and incident, there's a gap no pipeline touches.
·Five things every team eventually discovers they need: scheduling, self-service, cost tracking, cloning, and orphan detection.
·A 2-person platform team spends 30–50% of their week on these five gaps — Slack messages, Excel sheets, one-off Terraform modules.
·Building all five from scratch takes 16–40 weeks. Buying a platform fills them in one onboarding.

CI/CD is deployment automation. It's not operations.

CI/CD automates build, test, and deploy — not the 95% of a service lifecycle that follows: scheduling, self-service, cost tracking, cloning, and orphan detection.

CI/CD covers the deploy pipeline: build, test, deploy, rollback. When the pipeline is green, a new version is running. When it's red, on-call gets paged. This model works — every modern team has it.

What it doesn't cover: environment scheduling, developer self-service, cost tracking, cloning, waste detection. These aren't bugs in CI/CD. They're operations problems that live in a different category. One is a pipeline. The other is a control plane.

CI/CD Pipeline→Deploy?Operations

The ops automation gap

Key insight

CI/CD automated the deploy button. Nobody automated what happens after — the day-to-day of managing environments at scale. The deploy pipeline is solved. Operations is still manual.

CI/CD pipelines automate the build, test, and deployment phases of the software delivery lifecycle. Ongoing operational responsibilities — environment management, cost visibility, and access control — are outside the scope of a deployment pipeline.
— Paraphrased from AWS Well-Architected DevOps Guidance

Gap 1: Environment scheduling

Non-prod ECS environments run 168 hrs/week but are used ~55; scheduling them off nights and weekends cuts compute spend by 60–70%, the largest single cost lever available.

There are 168 hours in a week. Your team works 40–50 of them. Your dev and staging environments run all 168 — billing by the second through nights, weekends, and holidays. Nobody needs a dev environment at 3am on Sunday.

The DIY build:Lambda functions + EventBridge rules per environment. Each needs a separate cron, a per-timezone configuration, and an override mechanism for ad-hoc work. At 30 environments, that's 30 Lambda functions — each one a deployment artifact to maintain, monitor, and debug when it silently stops working.

Platform team cost: 2–4 weeks to build, ongoing maintenance as environments are added. The tools exist — EventBridge and Lambda are free or near-free — but the integration and maintenance burden scales linearly with fleet size. The full mechanics of ECS environment scheduling show why this breaks past ten environments.

Gap 2: Developer self-service

Developer self-service means scoped per-environment RBAC — restart staging, tail logs, flip a feature flag — without AWS Console access or a platform-team Slack ping.

“Can you restart staging?” — the Slack message every platform engineer receives at least twice a week. Developers can deploy to production via CI. They can't restart staging without you. The AWS Console is too dangerous. The IAM policy for “restart this one service” is too granular to hand out.

The DIY build: a web UI with RBAC per environment, backed by AWS API Gateway and scoped IAM policies. You need role-per-env mappings, a way to audit who did what, and a mechanism to revoke access. The IAM part alone — managing policies per service per environment — is the reason most teams give up and hand out broad AWS Console access.

Platform team cost: 4–8 weeks to build a minimum viable self-service portal. 2–5 hours per week spent on restart and status requests that the portal would handle. The pattern of developers blocked from restarting stagingrepeats on every team that hasn't solved scoped access.

Gap 3: Fleet visibility and cost tracking

AWS Cost Explorer shows one aggregate Fargate line item; per-environment cost attribution requires tagging every resource, waiting 24 hrs for propagation, and rebuilding monthly.

AWS Cost Explorer shows aggregate spend. It doesn't show per-environment cost. You can't answer “how much does staging cost vs dev this month?” without a spreadsheet and 24 hours of tag propagation delay.

The DIY build:tag every resource with an Environment key. Activate cost allocation tags in Billing. Wait 24 hours for tags to propagate. Export a CSV. Build a spreadsheet. Repeat monthly. This works — for a while. As the fleet grows, the spreadsheet gets abandoned, and the CTO asks “why is AWS up 30% this quarter?” and nobody can answer without a day of Cost Explorer archaeology.

Key insight

By the time you see the numbers in the spreadsheet, the money is spent. Real-time per-environment cost visibility — not aggregate billing data — is what lets platform teams react before the quarter ends.

Gap 4: Environment cloning

Cloning a production environment manually — ALB, RDS, 15 services, SSM params — takes 12 steps and 2–4 hours; a template-based clone reduces that to under 30 seconds.

“QA needs an isolated copy of staging to test the compliance flow.” This request lands in the platform team's Slack every few weeks. The environment has 18 services, 4 databases, networking rules, and a dozen environment variables. Building a copy means: write a new Terraform module, override variables, pray you didn't miss a dependency.

The DIY build: Terraform module that parameterizes every service, database, and network rule from a source environment. This is a 300–400 line module that needs to stay in sync with the source. Every time the source changes, the clone module drifts.

Platform team cost: 4–8 hours per clone request. At 3 clones per month, that's 12–24 hours of engineering time. Plus the cost of the days QA waits for the environment to be ready.

Gap 5: Orphaned environment detection

At 15 ECS environments, 2–3 are typically orphaned — no recent deploys, no active owner — each billing $200–400/mo for compute that serves zero production traffic.

Every team has them. An environment was spun up for a demo six months ago. A PR preview for a feature branch that was merged and abandoned. A hackathon project that shipped and was forgotten. Nobody deploys to these environments. Nobody owns them. They bill — quietly, every month.

The DIY build: pull the last deployment timestamp per environment. Cross-reference with the team directory. Environments with no deploy in 30+ days and no active owner go on a review list. The platform team reviews, confirms abandonment, and deletes the infrastructure.

Key insight

In a fleet of 30+ environments, most teams find 2–5 orphaned environments when they look seriously. At $200–400/month each, that's $400–$2,000/month — or $4,800–$24,000/year — for compute serving zero requests. A one-time audit catches it. Without it, the environments keep billing forever.

Building vs buying the ops layer

Building all five ops gaps in-house with AWS-native tools takes 16–40 engineer-weeks and adds 20–30% annual maintenance overhead as the fleet grows beyond 10 environments.

Gap	DIY weeks	Fortem
Scheduling	2–4	Built-in, per-timezone
Self-service	4–8	RBAC by environment
Cost tracking	2–3	Live per-env cost
Cloning	3–6	Clone in a few clicks
Orphans	1–2	Last deploy + owner visible

16–40 weeks· $90–220k in labor

Platform engineer at $180–220k/yr loaded (~$3,500–4,200/week, Glassdoor + Levels.fyi, 2026). Maintenance adds 20–30% annually as the fleet grows.

A fair objection: with AI coding tools — Claude, Copilot, Codex — the build time drops. A skilled engineer using current LLMs could ship these five tools in 6–16 weeks, not 16–40. That changes the cost equation. What it doesn't change: the maintenance burden, the integration surface (IAM, EventBridge, Cost Explorer APIs), and the fact that you're building internal tools while your competitors ship product. This is exactly the operations gap that opens up at fleet scale. The gap isn't only about time. It's about focus.

Key insight

A platform team spending 30–50% of their week on these five gaps isn't building product. They're maintaining internal tools — the same tools every team builds, differently, from scratch. CI/CD automated the deploy button. Ops needs the same treatment.

Fortem ships all five ops gaps — scheduling, self-service, cost tracking, cloning, and orphan detection — as a single control plane for ECS Fargate fleets. Seven-day onboarding, no Terraform changes required.

Book a 20-min call →

Worth reading

LandingECS Environment SchedulingEnvironment scheduling is the automation gap most CI/CD pipelines don't fill. See all approaches compared.GuideIt's Friday at 6pm. Your Developer Can't Restart Staging Without You.Self-service for developers is the other half of DevOps automation — no tickets, no on-call pages.