Predictions for 2025, Platform Engineering and Agentic AI
TL;DR — Platform teams consolidate or get reorganized. Agentic moves from “framework wars” to “operating discipline.” Eval becomes the new test pyramid. Open models close the quality gap on niche tasks. Cost discipline gets serious.
I have mixed feelings about predictions posts. They tend to age either too well (predicting things that were already obvious) or too poorly (predicting things that turned out to be wishful thinking). The honest middle path is to make calls specific enough to be wrong, and to write down the conditions under which I would change my mind. That is what this post tries to do.
Twelve predictions, grouped into platform engineering and agentic AI, with each one tagged by confidence. The point is not to be right about every one. The point is to give a working backend engineer a set of bets to push against, agree with, or actively disagree with, all of which produces a better plan than the alternative of vague directional vibes.
For the retrospective companion to this post, see The 2024 Wrap Up, The Agentic Era for Backend Engineers.
Platform engineering predictions
1. Platform teams will either prove unit economics or get reorganized. Confidence: high. By Q3 2025, the platform teams that survived 2024’s belt-tightening will be the ones with clear engineer-hours-saved math. The rest get reabsorbed into infrastructure groups or DevOps, with quiet promotions for the lead and quiet demotions for the discipline. The teams that read Cost Justifying Platform Investments and act on it have a fighting chance.
2. The IDP category bifurcates. Confidence: medium-high. Backstage continues to dominate at the high end, used by large orgs that want to assemble their own platform. A new category of “opinionated IDP as a service” emerges in the middle market, with Port and Cortex pushing in this direction along with a few startups that will be acquired before the year ends. Both categories survive. The “build your own from scratch” approach loses ground.
3. GitOps stops being the headline and becomes plumbing. Confidence: medium. ArgoCD and Flux mature past the point where they are interesting to talk about. The conferences shift from “how do I do GitOps” to “how do I do progressive delivery on top of GitOps.” Anyone still pitching “we should adopt GitOps” in 2025 has been asleep, but the underlying patterns are unambiguously won.
4. FinOps becomes a platform-team job, not a finance-team job. Confidence: high. The tagging, the chargeback, the rightsizing, the spot-instance discipline, all of it moves into the platform team’s mandate. Finance partners retain visibility and approval but stop expecting to drive the work. The teams that built this muscle in 2024 are well-positioned. The ones that did not will struggle to attract platform talent in 2025, because the best platform engineers want this work.
5. Service catalogs become AI-augmented or die. Confidence: medium. Plain service catalogs (Backstage’s, custom ones) hit usability walls because the metadata is always out of date and nobody enjoys maintaining them. The replacement is a catalog where an agent maintains the metadata by reading the code, the deploys, and the runtime. The early experiments here are promising. By Q4 2025, the leaders will look obvious. Right now they look weird.
6. Internal developer portals consolidate around three or four winners. Confidence: low-medium. The category is too crowded. Some combination of Backstage, Port, Cortex, and one as-yet-underestimated player ends up with most of the share. The rest pivot or get acquired. The condition that would change my view, a credible open-source competitor to Backstage that solves the metadata problem natively.
Agentic AI predictions
7. The framework wars end with no clear winner. Confidence: high. LangGraph, AutoGen, CrewAI, Mastra, OpenAI’s own agent SDK, the homegrown loops, all of them coexist. The convergence is at the architecture level (state machines + tool calling + checkpoints), not the framework level. Teams stop arguing about frameworks and start arguing about state stores and eval methodologies, which is healthier.
8. Eval becomes the new test pyramid. Confidence: high. Every serious agentic team in 2025 has an eval suite that runs in CI, blocks deploys on regression, and gets more investment than the unit test suite. Ragas, DeepEval, Promptfoo, Braintrust, Langfuse’s eval feature, and a long tail of homegrown evals all converge on a similar architecture. The teams without an eval suite stop shipping reliably by mid-year.
9. The biggest agentic productivity wins are internal, not customer-facing. Confidence: medium-high. Customer-facing agents continue to be hit-and-miss because user expectations are high and failure modes are public. The wins move to internal workflows where the bar is “better than the manual process,” which is achievable. Engineering productivity, support deflection, internal data exploration, ops automation. These are the boring 2025 wins.
10. Small models close the gap on bounded tasks. Confidence: medium. Llama 3.3, Qwen, the next Mistral, and the open models that arrive in spring 2025 close the quality gap with GPT-4 class models on bounded tasks (classification, extraction, structured generation). The frontier models still win on open-ended reasoning. The cost-aware teams move bounded workloads to open models in Q1-Q2 and save 60-80% on those workloads. The teams that do not run this analysis leave money on the table.
11. The agent observability category consolidates. Confidence: medium. The 2024 tooling explosion (Langfuse, LangSmith, Helicone, Phoenix, Logfire) settles into a smaller set of winners. OpenTelemetry GenAI conventions are widely adopted, and at least one of the major APM vendors (Datadog, New Relic, Honeycomb) launches a credible LLM observability product that is not a thin wrapper. The startups that win are the ones that ship eval as part of the same product, not just tracing.
12. Multi-agent systems disappoint, single-agent loops dominate. Confidence: medium. The 2024 hype around multi-agent frameworks does not pan out for most production workloads. Coordination overhead, debugging difficulty, and cost outweigh the benefits except for a narrow set of genuinely concurrent workloads. The 2025 production pattern is one carefully designed agent loop with a deep toolbox, not a swarm of agents talking to each other. CrewAI either adapts or finds itself in a niche.
The things I am avoiding predictions on
A few areas where I have strong gut feelings and weak data, which I will not turn into predictions.
- The future of vibe coding tools (Cursor, Cline, Aider, Continue, Claude Code). Too volatile, too dependent on model providers’ moves.
- The trajectory of any specific frontier model. The release cadence is fast enough that any prediction made in December 2024 will be wrong by March.
- Regulation. The EU AI Act is in motion, US federal action is uncertain, and the right call depends on the November 2024 election outcomes that I am writing this just after.
- Compensation. Every prediction I have made about engineering compensation in the last five years has been wrong. I am leaving this one alone.
For the underlying research on agentic patterns that I expect to mature in 2025, the Anthropic research on agentic systems is the canonical reading list. Their published work on tool use and Constitutional AI has informed most of the architecture choices I have made this year.
The criteria for being wrong
Predictions are only useful if you write down the criteria for being wrong. Here are mine.
| Prediction | I am wrong if |
|---|---|
| Platform teams prove unit economics | By Q3, platform headcount has grown without unit economics being standard |
| IDP bifurcates | Backstage usage stagnates or one tool clears the rest by 2x by Q3 |
| FinOps moves to platform | Finance teams retain primary ownership of FinOps tooling by Q4 |
| Framework wars end | A single framework holds more than 50% mindshare by Q4 |
| Eval becomes the new test pyramid | Less than 30% of agentic production teams have CI-enforced eval by Q4 |
| Internal wins beat customer-facing | A high-profile consumer agent launch dominates the H1 narrative |
| Small models close the gap | GPT-4 class proprietary models retain their cost-quality lead on bounded tasks |
| Observability consolidates | The 2024 long tail of tools all still exist with comparable funding at year end |
| Multi-agent disappoints | A clear multi-agent production success story dominates KubeCon-equivalent narrative |
I will revisit this post in December 2025 and give myself a grade. The point of the exercise is not to be right. The point is to have something specific to update against as the year unfolds, which is more useful than vague intuition.
Common Pitfalls
Treating predictions as a strategy. They are not. They are an input into your strategy. If your 2025 plan depends on a specific prediction being right, it is fragile. Plan for the median, hedge against the tails.
Confusing confidence with accuracy. I have high confidence in some of these. I will still be wrong about a few. Confidence reflects how stable I think the call is, not how certain I am about being right.
Ignoring the boring predictions. The unit economics one is boring. It is also the most likely to matter to your career in 2025. Boring usually beats exciting at horizon timescales.
Anchoring on the loudest takes. December prediction season produces a lot of noise. Most of it is wrong, including some of mine. Read multiple predictions posts, weigh them against your own observation, then form a view.
Skipping the criteria for being wrong. This is the most important part of the post. Predictions without falsification criteria are vibes. Demand falsification criteria from anyone giving you predictions, including yourself.
Forgetting that markets are local. Your industry, your geography, your company’s specific competitive position all matter. Generic predictions are starting points, not endpoints.
What’s Next
The point of writing predictions is to clarify your own bets. Mine are above. Yours will differ in places, and the differences are where the interesting conversations happen.
A practical homework for the last two weeks of December. Write your own twelve predictions for 2025. Tag confidence. Write the falsification criteria. Then put them in a drawer and revisit in March. The act of having to write down “what would change my mind” is more valuable than any specific prediction, and it is the discipline that produces the kind of senior engineering judgment that compounds over years.
Enjoy the holidays. The agentic era is normalizing into something that looks more like distributed systems engineering than wizardry, which is good news for the kind of backend engineers who like to ship things that work. I will see you back here in January with a much smaller piece about what the first two weeks of 2025 actually looked like.