AI

When AI actually helps — and when it doesn't

January 30, 2025
7 min read

When AI actually helps — and when it doesn't

There's a version of AI that shows up in product demos — the one with the pulsing gradient and the chat bubble that promises to "transform your workflow." Most of it is theater. It makes the roadmap look modern, and it does very little for the people using the product.

There's also a version of AI that quietly does useful work in the background and nobody talks about, because it's not a marketing story. That's the version we tend to build.

What AI theater looks like

You can usually spot theater without touching the product. A few reliable tells:

  • A chat interface bolted onto a form that already works.
  • A "summary" button that rewrites the same information the user just read.
  • Vague buzzwords in the UI — "AI-powered," "smart," "intelligent" — with no clear behaviour behind them.
  • Features that only demo well on hand-picked inputs, and visibly fall apart when the user tries their own.
  • A business case that is actually "investors like AI" in a costume.

Theater is expensive. It adds inference costs, latency, failure modes, and a support burden. And because the value is fuzzy, it's almost impossible to improve — nobody can agree on what "better" would even mean.

Where AI genuinely earns its place

There's a narrow set of jobs where language models do things that deterministic code can't do well, or can't do at a reasonable cost. When a feature lives inside this set, AI is a real tool, not a pose.

Summarisation and extraction

Turning ten support tickets into a one-paragraph daily digest. Pulling structured fields out of unstructured PDFs. Condensing a long meeting transcript into decisions and owners. These are jobs where the input is messy, the output tolerates a bit of imprecision, and the user is going to re-read the source anyway.

Classification and routing

Tagging inbound emails by intent. Flagging user-generated content for review. Routing form submissions to the right queue. A model can do in a prompt what used to take a brittle keyword rule — and it degrades gracefully when the input shifts.

Drafting

First-draft replies, first-draft descriptions, first-draft copy. The key word is draft. A human reviews and edits before it goes anywhere that matters. Used this way, AI is a typing assistant, and it saves real time.

Narrow internal assistants

A chatbot over your own runbooks, built for your own team, with clear boundaries and honest failure behaviour. Not a general-purpose agent. A small, grounded tool that beats searching a sprawling Notion workspace.

Where AI is the wrong answer

Anywhere the cost of being wrong is higher than the cost of being slow, you almost certainly want deterministic code.

  • **Payments, invoicing, and accounting.** "Mostly right" is catastrophic. You want code, tests, and audit logs.
  • **Data integrity.** Anything that writes to the canonical version of a record — customers, orders, inventory — belongs in explicit business logic, not a prompt.
  • **Critical workflows.** Anywhere a mistake costs real money, real trust, or real hours to unwind. A boring rule with a clear owner beats a model every time.
  • **Security-sensitive decisions.** Access control, rate limiting, fraud checks. Models can inform these; they should not be the final word.

A useful question: if this feature gives the wrong answer one time in fifty, is that a bad day or a crisis? If it's a crisis, don't put a model in the critical path.

How we approach AI on client projects

We treat AI the way we treat any other dependency — it has to pay for itself, and it has to be replaceable.

  • **Optional, not central.** AI features live alongside deterministic ones and don't block core workflows. If the model provider has a bad day, the product still works.
  • **Explained, not hidden.** The UI tells the user when they're looking at a generated draft. Nobody confuses a suggestion for a decision.
  • **Evaluated, not vibes-tested.** We write real evaluation sets — actual examples, actual expected behaviour — and we rerun them when we change prompts or models. "Looks good to me" isn't a testing strategy.
  • **Cost-modelled.** Before we ship, we know roughly what a thousand calls cost, what the p95 latency is, and how that scales with usage. AI features that make sense at launch sometimes stop making sense at scale; better to know up front.
  • **Replaceable.** We don't couple the product to a single vendor. Switching from one model to another should be a configuration change, not a rewrite.

The hidden costs founders don't see

When someone pitches you an AI feature, ask three questions before you agree:

1. What's the inference cost per active user per month, honestly?

2. What's the failure mode when the model is wrong, and who catches it?

3. What's the maintenance load a year from now, when models, prices, and APIs have all changed?

If the vendor can't answer those, they haven't built a product — they've built a demo.

AI is a useful tool. It is not a strategy. The companies getting real value from it are the ones treating it the same way they treat a database or a queue: a component with costs, failure modes, and a narrow job to do.

---

If you're trying to figure out whether AI belongs in your product — or whether it's already there and shouldn't be — contact us and we'll tell you what we actually think.

Got a project in mind?

Start a conversation with Bluestone. We build simple, robust software and hand you full ownership of the code, the infrastructure, and the IP.

Start a conversation