Agentic AI · SaaS

AI POCs Need Exit Criteria Before They Become Permanent Pilots

Bala Velayutham

Jump to section

Share

Summarize with AI

The POC That Would Not End

The pilot was still called a pilot twelve months later. Everyone liked the demo. Nobody would put it in the production budget because no one could show eval regression, rollback, trace retention, or an owner for week-twelve behavior.

Why Novelty Budget Hides Operating Cost

Teams assume a successful AI POC is one that impresses stakeholders. That definition keeps pilots alive and unfunded. A production-bound POC should be a contract: what must be proven, what artifacts must exist, who signs off, and what happens when the window closes.

The Pilot Became a Shadow Product

A SaaS support agent answered subscription questions well in demo sessions. The pilot expanded informally, but it had no tool-call dashboard, no prompt regression suite, no review UI for disputed answers, and no clear owner for cost spikes. Finance eventually asked whether the pilot should be shut down. The team recovered by writing exit criteria: fifty real workflow evals, trace logging with redaction, Tier B approval queue, cost per resolved ticket, tool kill switch, and support-owner sign-off. Only then did scope expand.

Exit Criteria Before the First Demo

Permanent pilot:
Demo success --> more stakeholders --> more exceptions --> no owner --> quiet sunset

Exit-driven POC:
Scope --> exit artifacts --> 60-90 day evidence window --> expand, stop, or rebuild

What Must Be Real Before Scale

Exit criteria should be written before demo day. They should cover business outcome, risk controls, technical operations, security, compliance, cost, and ownership. A good POC can end in three ways: production expansion, deliberate stop, or rebuild with a different architecture. Endless extension is a failure state because it hides missing evidence behind continued experimentation.

For POC exit criteria, the release review should inspect the workflow before it inspects the model. Every production use case needs a task boundary, identity model, allowed tool list, context source registry, policy version, trace format, and rollback path. The architecture should say which actions are read-only, which create drafts, which require approval, and which are blocked entirely.

POC exit criteria needs evidence, not confidence. Show a sample tool-call trace with user identity, tenant, policy decision, input redaction, output payload, cost, latency, and final action. Show how an incident commander disables one tool class without disabling the whole assistant. Show how a prompt, model, retrieval index, and tool schema change move through regression gates.

For POC exit criteria, the hard question is not whether the agent can complete the demo. The hard question is whether the system can explain what happened after a wrong answer, stale context, duplicate tool call, or permission denial.

When a Lightweight POC Is Enough

A research spike can be open-ended if the organization admits it is research. A POC should not be. A production pilot can start small, but it still needs controls. A vendor bake-off can compare tools, but the winning vendor should still pass the same exit criteria before autonomy expands.

The Cost of Clear Exit Rules

The honest tradeoff is not speed versus safety in the abstract. It is which actions deserve autonomy, which actions deserve draft mode, and which actions should never be delegated. The team should add control where the action changes customer data, money, access, or regulated records, then keep low-risk retrieval and drafting lightweight enough to keep learning.

POC exit criteria tests should include empty retrieval, wrong-tenant retrieval, prompt injection through retrieved documents, stale index versions, duplicate tool retries, partial tool completion, malformed tool output, permission denial, long-session memory, and cost spikes. The expected result is not always a better answer. Sometimes the expected result is refusal, escalation, draft-only mode, or tool disablement.

The Promotion Review

For POC exit criteria, a useful artifact shows allowed actions by task, not only prompts by intent. Include the user role, autonomy tier, eligible context sources, tool list, approval rule, trace fields, kill switch, and rollback owner. That record is more valuable than a prompt library because it says what the system may do when the answer becomes an action.

What Makes the POC Review Credible

A weak POC exit criteria review asks whether the agent answered correctly in a demo. A useful review asks whether the system can prove why it answered, what it was allowed to touch, and how the team can stop it safely. The reviewer should ask for a golden workflow set with expected tool traces, not only expected final text. A case should specify which tools may be called, which sources are eligible, what refusal looks like, what approval state is required, and what audit fields must be written.

POC exit criteria review should include rollout mechanics. Prompt changes, model route changes, retrieval index rebuilds, and tool schema changes should move through separate versioned gates because they fail differently. A model upgrade can change reasoning. A retrieval rebuild can change evidence. A tool schema change can change side effects. Treating all of those as one release type is how regressions hide.

For POC exit criteria, cost and latency should be first-class signals. An agent that takes eight tool calls to resolve a low-value task may be correct and still not production-worthy. Track cost per completed workflow, timeout rate, approval queue age, refusal quality, and human override rate. Those numbers tell leadership whether the agent is becoming operational software or a permanent demo with nicer logs.

Signals Worth Watching

Leadership should watch POC exit criteria signals: tool calls by tier, approval queue age, refusal quality, policy-block rate, cost per completed workflow, and the time it takes to disable one tool without disabling the whole assistant. Those numbers reveal whether the agent is becoming software or staying a guided demo.

The Artifact: POC Exit Record

The artifact worth keeping for POC exit criteria is a workflow control record. It should show the user role, allowed tools, autonomy tier, context sources, retrieval filters, approval state, trace retention rule, kill switch, and rollback owner. A prompt alone is not an artifact because it cannot prove authorization or side effects.

For POC exit criteria, include one sample trace from a real-shaped task. The trace should show source versions, tool calls, policy decisions, latency, cost, and final disposition. If the team cannot produce that trace, it is not ready to scale autonomy.

A practical POC exit criteria review should include one real-shaped workflow trace. The trace should show identity, tenant, prompt version, retrieval index version, selected sources, tool inputs, policy decision, approval state, cost, latency, and final disposition. If the trace cannot explain a wrong answer or a blocked action, the eval suite is not yet a release gate.

The useful review example is deliberately small. Pick one task, such as applying a credit, answering an authorization question, or drafting a customer reply. Walk through the identity, context, policy, tool call, approval state, and audit record. If that one task cannot be explained without reading prompts by hand, the system needs better boundaries before the team broadens autonomy.

The Rule for Pilots

A POC is not a smaller production system. It is a test of whether production evidence can be created cheaply. If the evidence is not named, the pilot becomes a waiting room.

Codebase Context Scan

Documentation sample + 2–3 use case ideas.