November 26, 2025

The Great Agent Scramble at KubeCon 2025: How AI is Rewiring Enterprise Software from Sales to SRE

Ari Zilka

It’s that time of year when CTOs and analytics Chiefs are looking at 2026 and auditing their company’s tech stack. This year they are finding a new pattern during that process: the software wasn't just storing data anymore; it was offering to do the work. Today, every vendor is pitching an "AI agent" - digital workers promising to forecast sales, debug code, or message customers.

This highlights a massive shift in the enterprise landscape. We have moved beyond simple chatbots to a world of autonomous agents, revealing a complex, two-front war. In the "front office," business applications are in a chaotic scramble for market share, confusing buyers with overlapping tools. Simultaneously, in the "back office," the infrastructure layer is undergoing a massive quiet revolution, using agents to manage the enormous complexity of Kubernetes and cloud computing.

The Front-Office Free-For-All

The market for business-facing AI is currently defined by "chaotic" energy. Major incumbents like Salesforce, ServiceNow, and Snowflake are no longer staying in their lanes and are colliding head-on, selling general-purpose agents that automate tasks across finance, marketing, and customer service.

For enterprise buyers, this overlap is creating "selection paralysis." Many firms are delaying purchases because it is unclear which tool offers the most value. The decision often comes down to "data gravity" - the idea that you should use the agent that lives where your data is stored.

The Infrastructure Revolution: Agentic SRE

While business leaders debate this, the engineers keeping the lights on are facing those leaders and a different kind of pressure. We saw it at KubeCon 2025 as every booth touted the complexities of running modern AI workloads on Kubernetes - spanning hundreds of clusters and GPU-hungry models - and has outpaced human ability to manage it manually.

The solution across KubeCon 2025? The rise of "Agentic SRE" (Site Reliability Engineering). Unlike the generalist bots in the front office, these are highly specialized agents designed to keep infrastructure from collapsing.

Three distinct approaches have emerged:

  • The Unifiers: Platforms are collapsing the walls between applications and infrastructure. They use agents like "calculators," offering human-readable explanations for errors and drafting remediation plans that require humans in the loop for approval.
  • The Autonomous Healers: Multiple vendors are introducing multi-agent schemes that go beyond static runbooks and claim to autonomously rightsize workloads and live-migrate to cheaper spot instances. In practice, this has helped companies reduce ticket volume by roughly 40%.
  • The Active Decision-Makers: Several vendors claim to take observability from passive dashboards to active decisioning, where the system doesn't just alert an engineer to a crash but proposes a code fix or infrastructure scaling required to solve it.

The Challenge of Orchestration and ROI

Despite the differences between the sales agent and a Kubernetes agent, enterprises face the same hurdles in adopting both: orchestration and cost.

In the business app world, "agent sprawl" is a growing liability. Companies like Microsoft are pitching Teams as a centralized hub to organize these digital workers, while in the infrastructure world, the Cloud Native Computing Foundation (CNCF) is pushing for "AI Conformance" to ensure these workloads remain portable and interoperable. Then there are regulations to scale, compliance ever-evolving and data sovereignty is a real concern again.

The uncontrolled proliferation of third-party tools and utilities needed to make Kubernetes production-ready is keeping developers managing hundreds of tools offering overlapping capabilities for networking, observability, CI/CD, service mesh, and secrets management. It isn’t only “cluster-sprawl” that is becoming a burden as developers must constantly learn, integrate, and maintain a complex, fragmented stack of tools, taking focus away from application development.

Plus the economics are shifting dramatically. Vendors are moving toward usage-based pricing—often around 20 to 30 cents per task. This forces companies to scrutinize ROI relentlessly because spending $5 in compute to generate a $3 product recommendation is a losing strategy - the solution must prove out to pay for themselves. You can't make it up in volume. When your infrastructure is leaving you razor-thin margins or negative, you have to rethink your observability, your cloud footprint, the way you deploy and manage software altogether.

The Need for Proactive Operations

Modern enterprises critically need effective AI integration in observability and the industry owes customers far better. That path is a move beyond simple data logging – it’s based on open standards, rooted in open source and instrumentation should be "table stakes." to ensure developers are not locked into proprietary agents.

Today’s operations must be able to successfully identify and address root cause system issues immediately. Shifting from passive transport layers into an active, trustworthy, easy-to-adopt-and-scale telemetry fabric. This is the difference between merely collecting data to fully understanding data from its foundation, and is what will enable truly intelligent automation. Throughout my career, I have focused on system optimization. Identifying the bottleneck feels akin to Neo deciphering the code behind the Matrix, where inefficiencies become evident in the streaming green text, leading to the realization, “a ha, there’s the problem!” As AI agents evolve and the tech community strives to build a smarter, cost-effective tech stack, it has become increasingly clear that the bottleneck lies in the SaaS data management approach. Forwarding all data to a third party only to receive it back milliseconds later is no longer a viable solution. And nowhere is this more apparent than in Observability.

I see a world of possibilities unlocking — one with faster, smarter automations that enhance site reliability — if we cease offloading our data and analytics to third parties that promise the world but ultimately deliver reports and alerts designed for human consumption.

Conclusion

The enterprise is entering a transitional "mezzanine" phase. We are moving away from simply adopting AI tools and toward managing a workforce of AI agents to a full change out of the infrastructure to accommodate the rush to agentic and install those foundation pipelines that will enable scaleout. The push to the cloud in 2025 is now straddling cloud and on-premise while juggling data demand around energy consumption.

The crucial shift lies not in trying to control the inevitable agent ecosystem but in transforming the telemetry foundation that feeds it. We need a way to move beyond passive data transport and turn the observability pipeline into an active telemetry control plane. This new model uses local processing and intelligence right at the source to filter, enrich, and dramatically reduce data volume. By ensuring only high-signal, high-context data reaches the expensive storage layer, we guarantee the economics of autonomous operations work, finally creating a reliable, cost-efficient, and open fabric for the age of agents.

For the C-suite, the challenge is not only technical; it is organizational and financial. The winners in this new era will not necessarily be the companies with the smartest individual agents, but those that successfully orchestrate them into a cohesive system and built on the infrastructure - one that balances the "front office" drive for revenue with the "back office" need for stability.

Next post
For Media Inquiriespr@mydecisive.ai
Support via Slack

We will respond within 48 business hours

Core Business Hours

Monday - Friday

9am - 5pm PDT

LinkedIn logoGithub logoYouTube logoSlack logo