Platform Engineering Is Not DevOps Rebranded, Building an IDP With Team Topologies in Mind
TL;DR — Platform engineering is product thinking applied to the developer experience, not a rename of DevOps. / Team Topologies gives you the org shape: stream-aligned teams consume a thin platform built by a platform team. / Start with golden paths and a service catalog, not a 40-service IDP shopping list.
Every CTO I’ve spoken to in the last six months has asked the same thing: should we hire a platform team, and what exactly do they build? The honest answer is that most companies already have a platform team — it’s just scattered across SRE, DevOps, and the two senior engineers everyone Slacks when the staging cluster falls over. Pulling that work into a single product-minded team is the actual transition.
The shift matters because “DevOps” as practiced in 2018 was largely a request-queue model: developers asked, ops delivered. That collapsed under microservices. There’s no version of that model where a six-person ops team supports 80 service teams. You need leverage, and the only durable form of leverage in infrastructure is a self-service platform.
I’ve been building these for a while. Here’s how I think about the architecture, the org topology, and the primitives that need to ship in the first 90 days.
What Team Topologies actually prescribes
Matthew Skelton and Manuel Pais’s Team Topologies book is mostly cited for its four team types, but the load-bearing idea is cognitive load. A stream-aligned team building a checkout service should not also be diagnosing pod evictions or arguing about which version of Kustomize to standardize on. Their cognitive budget belongs to the domain.
Three team types matter for platform work:
- Stream-aligned teams — own a product domain end-to-end. They are the platform’s customers.
- Platform teams — build and operate the IDP. Their product is internal.
- Enabling teams — short-lived, embedded with stream teams to transfer specific skills (security, performance, ML ops).
The interaction modes matter as much as the team types. The platform team should mostly interact with stream teams via X-as-a-Service: a catalog, a CLI, a Backstage portal. When a stream team has to schedule a meeting with the platform team to provision a database, you’ve reverted to ops-as-ticketing.
Golden paths beat platform feature checklists
The temptation when you charter a platform team is to ship a 30-item roadmap: Kubernetes, GitOps, secrets management, observability, service mesh, cost dashboards, internal PKI, and so on. Don’t. The first deliverable should be a golden path — one fully paved, end-to-end happy-path workflow for the most common service shape at your company.
For most teams I work with, that’s “build a Go or Node HTTP service, deploy it to staging on every merge, promote to prod with a click.” Everything on the path is opinionated, versioned, and documented. Everything off the path is allowed but unsupported.
Here’s the shape of a typical golden path repo template, rendered through Backstage’s software templates:
# template.yaml — Backstage 1.14 software template
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: go-http-service
title: Go HTTP Service (golden path)
tags: [go, recommended]
spec:
owner: platform-team
type: service
parameters:
- title: Service details
required: [name, owner]
properties:
name:
type: string
pattern: '^[a-z][a-z0-9-]{2,30}$'
owner:
type: string
ui:field: OwnerPicker
ui:options:
allowedKinds: [Group]
steps:
- id: fetch
name: Fetch skeleton
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
- id: publish
name: Publish to GitHub
action: publish:github
input:
repoUrl: github.com?owner=acme&repo=${{ parameters.name }}
defaultBranch: main
- id: register
name: Register in catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
The skeleton ships a Dockerfile, a Helm chart, a .github/workflows/ci.yml, an ArgoCD Application manifest, and a catalog-info.yaml. A developer goes from “new service” to “running in staging” in under 15 minutes. That’s the win condition.
The thin-slice IDP: what to actually build
A platform doesn’t need to be feature-complete to be useful. It needs to be coherent. I aim for these primitives in the first quarter:
1. A service catalog
Backstage 1.14 is the obvious choice in 2023. The catalog is where ownership lives. Every component, API, and resource gets a catalog-info.yaml that the platform reads. If you don’t know who owns a service when it pages at 3am, nothing else matters.
2. A GitOps deployment substrate
ArgoCD 2.7 or Flux 2.0 — both are fine. Pick one. The contract for stream teams is: write Kubernetes manifests (or a Helm values file), open a PR to the deployment repo, merge, watch it roll out. I’ll cover ApplicationSets and the multi-tenant ArgoCD pattern in the next post.
3. Infrastructure as code with a paved interface
Stream teams should not write raw Terraform for a Postgres instance. They should write:
apiVersion: db.acme.io/v1alpha1
kind: PostgresInstance
metadata:
name: checkout-prod
spec:
size: medium
version: "15"
backupRetentionDays: 30
Crossplane v1.12 makes this practical. The platform team owns the CompositeResourceDefinition and the Composition that maps it to AWS RDS. The stream team gets a declarative knob. You can do the same with Terraform modules behind a Backstage scaffolder action, but Crossplane keeps everything inside the Kubernetes control plane, which is where the rest of the platform already lives.
4. Observability defaults
Every service gets Prometheus 2.44 scraping, a Grafana 10 dashboard scaffolded from a template, and structured logs shipped to whatever log backend you’ve picked. Stream teams don’t write dashboards from scratch.
5. Secrets and identity
External Secrets Operator pointed at Vault or AWS Secrets Manager. Workload identity via IRSA (AWS) or Workload Identity (GCP). No long-lived service account keys in repos. Ever.
Common Pitfalls
I’ve watched a dozen platform team launches stall in roughly the same ways:
- Building for the 5% edge case. The senior engineer running a CockroachDB cluster with weird Helm overrides will always have weird Helm overrides. Don’t design the platform for them. Design for the median team.
- No product manager. A platform without a PM ships features nobody asked for. The PM doesn’t need to be a Kubernetes expert; they need to talk to stream teams weekly.
- Treating documentation as optional. The platform’s API is documentation. If the only way to know that the Helm chart accepts
resources.limits.memoryis to read the chart source, you don’t have a platform — you have a Helm chart. - Wrapping kubectl in a CLI and calling it abstraction. A platform CLI that just shells out to
kubectl applyprovides no new contract. Real abstraction means the platform owns what’s behind the API; users don’t reach through it. - Charging the platform team for cluster costs. This kills incentives. The platform should be funded centrally; stream teams should see the unit cost of their workloads (via something like OpenCost) and own that bill.
- Ignoring the migration path. You have legacy services. The platform’s adoption story has to include them, or you’ll end up with two platforms.
Measuring whether the platform is working
Don’t measure platform success in features shipped. Measure it the way you’d measure a SaaS product:
- Lead time for new service to prod — target under one day after the first service, under one hour by the third.
- Number of stream teams self-serving vs. raising tickets. The ticket count should bend down over quarters.
- DORA metrics across the org, not just the platform team’s repos.
- NPS or a quarterly survey from stream teams. Read the qualitative comments. The “this is fine but I hate the secrets workflow” comments are gold.
The DORA work from Google’s DevOps Research and Assessment program (see the DORA reports) consistently shows that platform investments correlate with throughput and stability gains — but only when the platform is actually adopted. A platform with three users is a science project.
Wrapping Up
Platform engineering is what happens when a company decides developer experience is worth a permanent budget line. Done right, it pulls cognitive load off product teams and gives the org real leverage. Done wrong, it becomes another internal tool nobody uses while teams continue to ship their own Helm charts.
In the rest of this series I’ll get concrete: ArgoCD ApplicationSets at scale, FluxCD versus ArgoCD with real trade-offs, multi-tenancy on Kubernetes 1.27, and progressive delivery you can actually trust. Start with the org shape and the golden path. The tools come second.