Agentic AI · Healthtech

The Real Challenge of Enterprise AI Isn't Models. It's Context Management.

Bala Velayutham

2 JUNE 2025

Agentic AI Healthtech View Agentic AI service

Jump to section

Summarize with AI

How Context Becomes the Product

A financial advisory assistant cited a suitability memo that had been superseded two months earlier. The embedding job was quarterly and no one owned freshness. In a second incident, semantic retrieval pulled a memo from another desk because the prompt asked the model to stay in scope, but the index had no desk filter. The fix was a context plane with effective dates, source owners, role filters, live CRM tools for client-specific facts, and fail-closed behavior when critical sources were stale.

What Context Management Must Prove

The Hidden Failure Class

Quarterly dump, daily questions

Teams embed all policies once per quarter. Business changes weekly. Users get confident answers from retired guidance. Leadership blames "hallucination" when the index is a time machine.

Stale context is especially dangerous because fluent models sound authoritative. A wrong number from a 2023 rate table delivered in perfect prose triggers more bad decisions than a vague answer would. Freshness SLAs and visible version tags in responses turn a hidden infrastructure issue into something users and auditors can reason about.

Retrieval without boundaries

Vector search over "all company documents" ignores:

Tenant isolation in SaaS
Role-based policy tiers in wealth management
Minimum necessary in health workflows

One bad chunk in context becomes a compliance incident.

Boundaries must apply at every hop: index partitioning, query filters, API authorization, and assembly of the final context bundle. Prompt instructions like "only use documents for this client" are not enforcement. They are hope. Code that rejects out-of-scope chunks before the model sees them is enforcement.

Model churn hides context debt

Switching models changes tone and reasoning style. It does not fix missing CRM fields. Teams run bake-offs on benchmarks while production still lacks source-of-truth connectors with refresh SLAs.

Tool context treated as free

Agents that call APIs without caching policy may hammer fragile legacy systems or return inconsistent snapshots within one answer.

Live tool output is context too. If one tool returns account balance at 9

and another returns pending holds at 9

without timestamps, the model synthesizes a story from incompatible snapshots. Context management includes temporal consistency: what "as of" time applies to this answer, and what happens when sources disagree.

The Architecture Split

Bad: dump and pray

All docs --> embedding index --> retrieve top-k --> model
(no tenant filter, no version, no freshness check)

Good: governed context plane

Request (user, tenant, task)
    --> policy engine (allowed sources)
    --> retrieval per source with version + TTL
    --> assemble context bundle (logged)
    --> model + tools

Properties:

Fail closed if critical source stale beyond SLA.
Audit log of document ids and API record versions used.
Scoped indexes per tenant or data domain where required.

A governed context plane also makes debugging humane. When a user reports a wrong answer, you replay the logged bundle: document ids, API versions, filters applied, and freshness checks passed or failed. That beats guessing whether the model or the index failed.

The Deeper Operating Rule

Enterprise AI is an integration and governance problem wearing a chat UI.

CTOs already govern schemas, APIs, and data access. Context management extends that discipline to what machines read on behalf of users. Model selection is a tuning knob after sources are trustworthy.

Think of context owners the way you think of data stewards. Someone must approve when a policy PDF changes, when a CRM field becomes mandatory for a use case, and when an index may include a new document class. Without named owners, freshness jobs run on autopilot until they silently stop.

Worked Example: wealth policy assistant

Advisors ask questions against compliance PDFs and client suitability data.

Issue	Symptom	Fix
Stale PDF index	cites 2023 rule	weekly refresh job + version tag in answer
Broad retrieval	includes another desk's memo	desk-level metadata filter
CRM gap	wrong risk profile	mandatory live CRM tool for client-specific asks

Model swap from vendor A to B did not change outcomes until context plane fixed.

The wealth desk example is typical: leadership funded a model bake-off while compliance worried about citation accuracy. Citations improved only when PDF versions and desk filters were correct. The winning model was whichever one sat on top of trustworthy context, not whichever scored highest on a generic benchmark.

Where This Shows Up: healthtech and financial services

Healthtech: answers must respect structured clinical and authorization data, not only narrative notes. Missing coded diagnosis in context drives unsafe suggestions.

Clinical workflows often blend narrative progress notes with coded orders, allergies, and authorization status. If retrieval favors readable prose over structured fields, the model builds plausible stories without the coded facts that determine safety. Context management means schema-aware retrieval: required fields for certain question types, not only semantic similarity over notes.

Financial services: wrong policy version or incomplete suitability context creates regulatory exposure beyond user annoyance.

Advisors need answers grounded in the policy version effective for that product and jurisdiction, plus client-specific suitability data pulled live from systems of record. Static embeddings of policy memos without version metadata are a regulatory accident waiting for a confident sentence.

Both need owned sources, scoped retrieval, and freshness, not another model benchmark.

The Context Boundary

Weak RAG:
All docs --> embeddings --> top-k --> model

Governed context:
Request(user, role, tenant, task) --> Policy/source registry
       --> filtered retrieval + live tools --> logged context bundle --> model

What Better Context Costs

Partitioned indexes cost more to operate than one large index. Live tools add latency. Fail-closed behavior creates more refusals. All three are cheaper than fluent leakage or stale regulated advice.

For enterprise AI context management, the useful review is not a generic architecture checklist. It should inspect permission, context, tool behavior, eval evidence, and rollback. If those fields are missing, the team may still be busy, but leadership does not yet have a decision-quality artifact.

Evidence Before Promotion

For AI systems, the release review should inspect the path taken before the model produces text. Start with identity. Which user, tenant, role, policy version, and task type entered the workflow? Which sources became eligible because of that identity? Which sources were rejected? If the team cannot answer from logs, the system is not auditable enough for production.

Next review context freshness. Static embeddings, vector indexes, document stores, CRM fields, and tool results all need owners and maximum age. A model answering from a stale policy is not hallucinating in the usual sense. It is faithfully using bad evidence. Critical sources should fail closed when freshness checks fail. Less critical sources can degrade with labels, but the degradation should be deliberate and visible.

Then review tools by blast radius. Read-only tools still leak data if scope is broad. Draft tools create review burden. Write tools change state and need idempotency, approval thresholds, scoped credentials, and rollback behavior. A shared service token is an architectural smell because it erases the user on whose behalf the action happened.

Finally, inspect evals and incident controls. Evals should replay de-identified production-shaped traces and score source selection, permission compliance, tool choice, and outcome. Kill switches should be granular: disable a credit tool, freeze memory refresh, force human approval, or route one intent class away from automation without taking down harmless read-only use cases.

Tests That Should Fail First

AI release tests should include permission denial, empty retrieval, stale retrieval, wrong-tenant retrieval, tool timeout, duplicate tool retry, malformed tool output, long conversation memory, and user attempts to override policy. Test cost and latency under realistic tool chains, not only one warm prompt. Test whether the answer cites the source that actually justified the action.

For agentic workflows, include partial completion. The model may draft a response after a tool failed, or execute one side effect before another tool times out. The workflow must know whether to compensate, ask for approval, fail closed, or resume from a durable state.

Other Context Strategies That Can Work

Static indexes are acceptable for stable knowledge with owners and versions. Decision support needs live structured tools plus retrieval. Regulated guidance should fail closed when freshness or authorization checks fail.

In enterprise AI context management, the alternative paths are not steps on a ladder. Each one carries a different mix of risk, cost, and learning. The weak choice is the one that hides the tradeoff until users, operators, or auditors discover it for you.

The Rule for Enterprise AI

Enterprise AI fails first in the context plane. The model often sounds wrong because the system loaded stale, incomplete, over-broad, or unauthorized evidence. The practical lesson is to demand evidence that fits enterprise AI context management, not a universal checklist. The artifact should expose permission, context, tool behavior, eval evidence, and rollback clearly enough for another team to challenge the decision.

If enterprise AI context management is the decision in front of your team, use the Codebase Context Scan to pressure-test the boundary before it hardens.

Recommended for you

Agentic AI · SaaS

AI POCs Need Exit Criteria Before They Become Permanent Pilots

Bala Velayutham

15 DECEMBER 2025

A POC without exit criteria becomes a permanent pilot: interesting enough to demo, too fragile to fund, and never safe enough to operate.

Read article

Agentic AI · Fintech

AI Agents Need Permission Tiers Before They Touch Production Systems

Bala Velayutham

1 DECEMBER 2025

Agents need autonomy tiers. Read, suggest, and act workflows carry different blast radius, audit, approval, and segregation-of-duties requirements.

Read article

Agentic AI · Healthtech

AI Eval Sets Should Come From Production Workflows, Not Demo Prompts

Bala Velayutham

17 NOVEMBER 2025

Demo prompts prove the demo still works. Production evals need real workflow traces, expected tool behavior, policy checks, and regression gates.

Read article

Codebase Context Scan

Documentation sample + 2–3 use case ideas.

Book a free working sessionBook a free working session

The Real Challenge of Enterprise AI Isn't Models. It's Context Management.

The Model Was Not the Missing Piece

How Context Becomes the Product

Where Enterprise AI Gets Fooled

What Context Management Must Prove

The Hidden Failure Class

Quarterly dump, daily questions

Retrieval without boundaries

Model churn hides context debt

Tool context treated as free

The Architecture Split

Bad: dump and pray

Good: governed context plane

The Deeper Operating Rule

Worked Example: wealth policy assistant

Where This Shows Up: healthtech and financial services

The Context Boundary

What Better Context Costs

Evidence Before Promotion

Tests That Should Fail First

Other Context Strategies That Can Work

The Rule for Enterprise AI

Recommended for you

AI POCs Need Exit Criteria Before They Become Permanent Pilots

AI Agents Need Permission Tiers Before They Touch Production Systems

AI Eval Sets Should Come From Production Workflows, Not Demo Prompts

Codebase Context Scan