Custom Software · Manufacturing

Custom Software Isn't Expensive. Rebuilding Bad Software Is.

Mahesh Kanna

Jump to section

Share

Summarize with AI

The Expensive Part Starts After Launch

Custom software is not expensive because it is custom. It becomes expensive when the first build skips the evidence that makes future change safe.

Why Cheap Builds Become Expensive Systems

Rebuild proposals often assume the old system failed because the stack or vendor was wrong. More often, the first build optimized launch speed while omitting tests, boundaries, runbooks, migration thinking, and ownership.

For custom software rebuild cost, a senior review should ask which delivery decision is being made, which evidence proves it, and what signal would force the team to pause.

Design for the Second Team

The Hidden Failure Class

Velocity without guardrails

Teams skip automated tests "to move fast." Every change needs manual regression. Fear slows releases. Hotfixes break neighbors.

Velocity without guardrails feels great in month one. By month nine, the same team spends Fridays manually clicking through flows they are afraid to automate. The "saved" test budget returns as overtime and missed peak-season launches.

No ownership of domain model

Logic spreads across controllers, stored procedures, and spreadsheets. Nobody can describe rules consistently. Rebuild team re-discovers behavior in production.

When rules live in five places, even a skilled rebuild team must archeology their way to parity. Users report "the old system rounded tax differently on Tuesdays" and nobody knows which layer encoded that quirk. Disciplined early slices keep business rules near tests and docs, not scattered across tribal knowledge.

Documentation as afterthought

Onboarding takes months. Key person leaves. Rebuild becomes the only way anyone trusts changes.

Documentation does not mean hundred-page wikis. It means short architecture decision records, acceptance criteria checked into the repo, and runbooks for the top three ops tasks. That material survives turnover. Undocumented cleverness does not.

Vendor incentives

Fixed-price bids reward hidden scope cuts: no observability, no migration strategy, no debt paydown line items.

Ask bidders what happens after launch: who owns tests, who updates docs, how change requests are priced once the "MVP" contract ends. Vendors who cannot answer clearly are pricing a handoff that becomes your rebuild RFP.

The Architecture Split

Bad: optimize first ship date

Fast MVP --> no tests --> tangled modules --> rewrite RFP in 36 months

Hidden multipliers:

  • Data migration from undocumented schema
  • Parallel run with angry users
  • Feature parity arguments without evidence

Good: disciplined early slices

Vertical slice --> tests + module boundary + deploy pipeline
       --> incremental features on stable base

Rebuild becomes selective modernization, not emergency rewrite.

Selective modernization means extracting bounded modules, strangling legacy paths, and keeping customer-visible behavior stable while internals improve. It requires the module boundaries and tests that cheap builds skipped, but it avoids the dual-run death march of full greenfield replacement.

The Deeper Operating Rule

Technical debt is a loan against future change. CTOs should score debt by customer-visible failure risk, not engineer annoyance alone.

Paying interest means slower features and more incidents. Defaulting on the loan means rebuild, which is refinancing at punitive rates.

CTOs should review debt like a portfolio: which shortcuts threaten revenue this quarter, which are merely ugly. Fund paydown on customer-visible failure paths first. That sequencing often prevents the full rewrite conversation entirely.

Worked Example: retail inventory tool

A retailer built an internal inventory tool quickly without integration tests. Promotions caused silent mismatches with POS.

PathCost shape
Keep patchingops overtime each peak season
Full rewritetwo years dual entry
Modular fixbounded sync module + tests, months

Rebuild quote looked like "fresh start." Modular fix was cheaper TCO.

The retailer avoided two years of dual entry because they bounded the problem: sync correctness between promotion engine and POS, with tests around the failure modes users actually saw. A rewrite would have reimplemented inventory, reporting, and admin screens nobody complained about yet.

Where This Shows Up: manufacturing and retail

Manufacturing: plant scheduling tools glued to spreadsheets fail when ERP upgrades. Rebuild stops the line longer than gradual module extraction.

Plants cannot afford a multi-year rewrite that pauses scheduling innovation. Bounded modules with integration tests around ERP handoffs let teams modernize while production keeps running. The expensive mistake is treating every legacy glue tool as a greenfield rewrite candidate.

Retail: omnichannel promises on brittle custom middleware rebuild every few years unless integration and tests are funded early.

Promotions, inventory, and fulfillment touch many systems. Cheap middleware without contract tests fails silently until Black Friday. Funding tests on the promotion sync path in year one is boring. It is cheaper than explaining stockouts to the board in year three.

Both teach: cheap build is expensive ownership.

When you model TCO, include the cost of a program manager reconciling two systems, the opportunity cost of features not shipped during rewrite, and the talent churn when engineers tire of firefighting. Those line items often exceed the savings from the lowest initial bid.

The Moment the Shortcut Becomes Product Risk

A retailer built promotion middleware quickly across e-commerce, POS, and warehouse systems. Peak season exposed duplicate callbacks, non-idempotent discounts, timezone drift, and no replay runbook. A full rewrite looked attractive. The cheaper path bounded the failing capability: promotion-to-inventory sync. The team added event contracts, idempotency keys, replay tooling, and integration tests instead of rebuilding stable admin screens.

What Gets Slower When You Build Well

The tradeoff is funding boring resilience before the pain is visible. Once users depend on bad behavior, every correction becomes migration work.

For custom software rebuild cost, the useful review is not a generic architecture checklist. It should inspect slice scope, ownership, guardrails, support path, rollback, and defer list. If those fields are missing, the team may still be busy, but leadership does not yet have a decision-quality artifact.

Evidence Before Promotion

For rebuild cost, the release review should prove that the first useful slice can be operated, not merely demonstrated. Start with the vertical path. A real path begins with a user action, crosses identity and authorization, persists state, touches at least one meaningful integration when integration is part of the value, emits audit or telemetry, and has a rollback or correction path.

The second review item is ownership of decisions. One product owner must be able to cut scope without committee escalation. One engineering owner must be able to defend guardrails such as tenancy, data storage, integration style, and deploy discipline. When ownership is diffused, engineering fills the vacuum with assumptions. Those assumptions become expensive once customers and operators adapt to them.

The third item is change safety. The release should include smoke tests, a small set of acceptance tests around the slice, basic observability, and a runbook for the top failure modes. In SaaS, that usually includes tenant isolation, billing or entitlement behavior, and feature flag rollback. In healthtech, it includes PHI handling, audit events, and timeout behavior for external systems. In retail or manufacturing, it includes integration contracts with POS, ERP, warehouse, or plant systems.

Finally, review what was intentionally not built. A disciplined first release has explicit exclusions. Those exclusions matter because they prevent the team from treating unfinished adjacent workflows as bugs after launch. Clear scope is not anti-agile. It is how learning survives contact with production.

Tests That Should Fail First

Custom software release tests should cover authorization gaps, tenant boundary mistakes, integration timeouts, duplicate submissions, migration rollback, audit event omissions, and manual correction paths. Test the awkward operator workflow, not only the polished user workflow. If support cannot repair a failed transaction without database access, the release is not operationally ready.

For early slices, test what was deliberately excluded. A narrow release often fails when adjacent workflows sneak in through user behavior. If only one location, payer, plant, or tenant is in scope, the system should reject or route the others intentionally rather than failing in undefined ways.

The Diagram That Belongs in the Kickoff

Cheap build path:
Fast launch --> no tests/contracts/runbooks --> fear of change --> rewrite RFP

Disciplined path:
Vertical slice --> tests + contracts + runbook --> selective modernization

Choices That Compete in the First Quarter

Full rebuild is justified when the product model is wrong or the platform cannot meet hard constraints. Refactor in place works when tests and seams exist. Selective modernization wins when the failure path can be bounded.

In custom software rebuild cost, the alternative paths are not steps on a ladder. Each one carries a different mix of risk, cost, and learning. The weak choice is the one that hides the tradeoff until users, operators, or auditors discover it for you.

The Rule About Rebuild Cost

Custom software is not expensive because it is custom. It becomes expensive when the first build skips the evidence that makes future change safe. The practical lesson is to demand evidence that fits custom software rebuild cost, not a universal checklist. The artifact should expose slice scope, ownership, guardrails, support path, rollback, and defer list clearly enough for another team to challenge the decision.

If custom software rebuild cost is the decision in front of your team, use the Sprint Readiness Review to pressure-test the boundary before it hardens.

Sprint Readiness Review

Sample sprint outline + backlog slice from your brief.