Every leadership meeting has the same tension now: AI is no longer optional in the narrative, but it is still optional in the P&L.
Boards hear about agents, copilots, and “transformation.” Teams run pilots. Vendors promise payback in months. Then finance asks a simple question — what changed for customers or costs? — and the room goes quiet.
That gap is not failure of ambition. It is what the market is actually working through in 2026: moving from experiments to operations, while most organizations still cannot point to durable ROI. This post is a field-level summary for business and product leaders: what is being discussed, what is landing in production, and how to choose a first application that deserves budget beyond the demo.
For engineering teams building eval and monitoring discipline, see our technical guide: How to Benchmark LLM Applications.
What changed in the conversation (2025 → 2026)
Three shifts show up across analyst surveys, implementation trackers, and executive commentary — not always with the same numbers, but with the same direction.
1. From “AI strategy” to “AI inside the workflow”
Gartner and similar voices describe enterprises moving beyond hype toward practical projects: less “build an AI company,” more embed AI into systems people already use — CRM, ERP, ticketing, document workflows, engineering toolchains.
The State of Applied AI (April 2026), based on ~200 documented implementations, finds that AI embedded in enterprise software (47%) outpaces direct model integration (31%). Translation for buyers: the winning pattern is often upgrade the toolchain, not stand up a separate magic chatbox.
2. From pilots to production — especially agents
Agentic AI is the headline of 2026. Industry research (e.g. KXN’s State of Agentic AI in the Enterprise 2026) reports a sharp jump in organizations running agents beyond pilot stage — with financial services and operations-heavy sectors in the lead for document reconciliation, compliance checks, and customer operations.
The market story is no longer “can agents work?” but “which agent workflows survive legal, security, and finance review?”
3. From growth promises to cost and risk
ROI framing has sobered. Surveys cited widely in 2026 (KPMG Global AI Pulse, Forbes Technology Council, and others) cluster around uncomfortable themes:
- Spend is up — many enterprises project ~$200M+ AI investment over 12 months at the upper end of the market. - Proven ROI at scale is still rare — a small single-digit share report established ROI; a larger share expect measurable return within a year. - A large minority cannot attribute P&L impact — CIOs struggle to quantify value; EBIT attribution remains weak for many. - Governance is non-negotiable — human review for consequential decisions, audit trails, and (in the EU) AI Act readiness ahead of August 2026 enforcement for high-risk use cases.
The consensus is not “AI failed.” It is: AI is entering the same bar as any other capital spend — measurable outcomes, controlled risk, and integration with legacy reality.
What organizations are actually deploying
Ignore the keynote slide. In production, the same categories appear again and again.
Area: Operations & back office — What “real” looks like: Invoice matching, PO/receipt reconciliation, contract extraction, internal knowledge search — Why it gets funded: Clear before/after: hours → minutes, fewer errors
Area: Customer operations — What “real” looks like: Tier-1 support deflection, routing, summarization for agents — Why it gets funded: Volume economics; measurable handle time and CSAT
Area: Software delivery — What “real” looks like: Code assistance, test generation, incident summarization — Why it gets funded: Engineering capacity; faster cycle time (harder to attribute, still popular)
Area: Compliance & risk — What “real” looks like: Policy checks, audit prep, KYC/AML assist, regulatory monitoring — Why it gets funded: Cost of failure is high; human-in-the-loop is acceptable
Area: Healthcare & regulated care — What “real” looks like: Care coordination, referral follow-up, patient messaging triage — Why it gets funded: Staffing gaps + readmission cost — see our work on AI care coordinators
Applied’s dataset puts Operations (~39%) and Software Engineering (~21%) at the top of functional adoption. Technology and financial services lead by industry in documented cases; healthcare shows up meaningfully but with heavier governance.
Implication: If your first initiative does not touch a high-volume, repetitive, measurable workflow, you are competing with every other “innovation” project for attention — and losing.
Where the money shows up (and where it does not)
Market ROI narratives in 2026 are bimodal.
Stories that hold up under scrutiny
- Document-heavy workflows — classification, extraction, comparison against rules; success = fewer manual touches and faster cycle time. - Customer service at scale — bounded intents (order status, resets, scheduling); success = deflection rate + quality of handoff to humans. - Embedded copilots — inside tools users already trust; success = adoption and time-on-task, not “messages per day.”
Stories that often disappoint
- Open-ended “ask anything” assistants on public websites without scope, evals, or escalation design. - “Autonomous agents” on day one with write access to production systems and no approval gates. - Marketing content factories with no brand, legal, or factual review — fast output, reputational risk. - Strategy decks without integration — AI that cannot read your CRM, ERP, or ticket history is theater.
Analyst and vendor surveys disagree on exact payback months (6–12 months is a common band for agent programs that do work), but they agree on the discriminator: teams that invest in evaluation, governance, and workflow integration outperform those that only scale token spend.
That matches what we see in client work: the business win is rarely the model; it is the process wrapper — who approves, what gets logged, what happens when the model is wrong.
Five questions the market is asking (that you should ask too)
These are the questions showing up in board packs, RFPs, and partner conversations in Q2 2026.
1. What decision or cost line moves if this works? Revenue, gross margin, OPEX, risk cost, or customer retention — pick one primary metric.
2. Is this minimal, limited, or high-risk under the EU AI Act? Many EU businesses use chatbots, hiring tools, or scoring — disclosure, documentation, and human oversight may already apply before “full AI transformation.” August 2026 is a real planning horizon for high-risk systems.
3. Do we buy, embed, or build? Buy/embedded wins when time-to-value matters and the workflow is standard. Build wins when the workflow is your moat and data is sensitive.
4. Where does a human have to stay in the loop? Surveys consistently show majorities requiring human validation for significant decisions — not as a temporary compromise, but as operating design.
5. What do we stop doing if we fund this? Mature buyers are forcing portfolio tradeoffs. AI that does not replace manual work or rework is just another subscription.
A practical way to pick your first real application
Use this filter before you fund a second pilot.
Criterion: Volume — Pass: Thousands of similar cases per month — Fail: Rare, bespoke requests
Criterion: Success definition — Pass: Agreed KPI (time, error rate, deflection, revenue) — Fail: “Users liked it”
Criterion: Data access — Pass: Can read/write via APIs into systems of record — Fail: Copy-paste from PDFs and hope
Criterion: Blast radius — Pass: Wrong answer is recoverable or reviewed — Fail: Irreversible financial/medical/legal action without review
Criterion: Owner — Pass: Named ops/product owner, not only IT — Fail: “Innovation lab” with no P&L
Strong first bets we see work for mid-market and enterprise clients:
- Inbound support triage + draft replies for a defined product line - Document intake (contracts, claims, applications) with extraction into your database - Internal policy / procedure Q&A grounded in approved docs only - Care or field operations coordination where missed follow-ups have a known dollar cost
Weak first bets (unless you already have eval maturity):
- Company-wide general assistant - Fully autonomous outbound sales or legal agents - Anything that replaces professional judgment with zero audit trail
Start narrow. Prove one KPI in production for 90 days. Then expand scope — the market’s winners in 2026 are doing exactly that, not rolling out “AI everywhere” in one quarter.
What we think is under-discussed (and over-hyped)
Under-discussed
- Integration tax — Most roadmaps underestimate connecting to legacy ERP, on-prem databases, and permission models. Market surveys cite legacy integration as a top barrier as often as “skills gap.” - Change management — Frontline staff must trust escalation paths; otherwise they work around the tool and metrics lie. - Unit economics — Inference cost per successful outcome, not per chat message — finance will ask eventually.
Over-hyped
- Autonomy without accountability — Agents that act are marketable; agents that explain, log, and defer are deployable. - Model upgrades as strategy — Switching to the latest model is not a product roadmap. - Pilot counts — “We have 12 AI experiments” is not adoption; production count and KPI movement are.
The honest summary for leadership
The market in 2026 is not anti-AI. It is anti-theater.
Organizations are spending, deploying agents, and embedding AI in software — but separating winners from noise through governance, workflow fit, and measurable outcomes. The businesses that will look smart in 2027 are not those with the most pilots; they are those with one or two production systems their CFO can explain in one slide.
If you are deciding where to place your next bet, start with a workflow that already has a cost number attached — then decide whether AI is the best way to move it, not whether AI is on the strategy slide.
Sources & further reading (market research)
Use these for your own due diligence; figures vary by sample and definition of “ROI.”
Source: KPMG Global AI Pulse Q1 2026 — What it covers: Enterprise spend, ROI expectations, orchestration vs. tool sprawl
Source: State of Applied AI — April 2026 — What it covers: ~200 implementations: industries, functions, embedded vs. direct AI
Source: KXN — State of Agentic AI in the Enterprise 2026 — What it covers: Agent production adoption, savings bands by workflow
Source: Gartner / CIO media (2026) — What it covers: Shift from hype to pragmatic, integrated projects
Source: EU AI Act guides (e.g. national and compliance advisories) — What it covers: SME obligations, Aug 2026 high-risk timeline
Want to pressure-test a use case?
If you have a workflow in mind and want a straight conversation on buy vs. build, risk tier, and whether it is a first-project fit — schedule a call. We will tell you if we would fund it ourselves.
