The hidden cost of Excel: where the time actually goes
The interesting question for a software studio in 2026 isn’t "should we add AI?" — the marketing department has already answered that. It’s "how do you embed an AI assistant inside production software for an institutional client without turning their compliance officer’s hair white?"
We shipped one inside our Diocese Management System in 2024 and have learned six things since. Here they are, ordered by how badly we wished we’d known them sooner.
1. The first thing to design is what the AI is not allowed to do
Before you write a single prompt, write the list of things the assistant must refuse. For us:
- Don’t answer questions about safeguarding cases the user doesn’t have role-level access to.
- Don’t make up dates, names or sums of money — cite the source row from the database or refuse.
- Don’t draft pastoral correspondence on behalf of clergy.
- Don’t answer questions whose answer would amount to legal advice.
The system prompt enforces these as hard refusals. The model is instructed to say "I can’t help with that — please ask {named human}" rather than try and fail.
2. Role-scoping is non-negotiable
The data the AI assistant sees has to be exactly what the logged-in user is authorised to see. Not most. Not almost. Exactly.
The cleanest pattern: do the database query first, with the user’s normal permissions. Then hand the rows to the LLM with a prompt like "answer the user’s question using only the rows below." The model never sees data the user wasn’t entitled to. Cross-tenant leakage simply isn’t possible.
The wrong pattern, which several big SaaS vendors are still shipping: dump the whole database into a vector store, embed the user’s query, and trust the retrieval to filter. It’ll be 99% right and 1% catastrophic.
3. Audit-trail every interaction
Every prompt, every response, every tool call — written to an immutable log with the user ID, the timestamp, and the model version. This is the question every regulated buyer asks within the first thirty minutes of a demo. Not having an answer is the difference between a contract and a polite "we’ll think about it".
The log doesn’t need to be searchable in real time. It needs to exist and be exportable when the auditor asks.
4. Refuse confidently, often
The instinct of most LLMs is to be helpful. In a regulated context, that’s the wrong instinct. We tune ours to refuse aggressively when:
- The retrieved data doesn’t actually answer the question (don’t guess; say so)
- The question is outside the system’s remit (don’t play general-purpose chatbot; redirect to the right tool or person)
- The answer would require interpretation of canon law, charity law, employment law, or anything similar (refuse and route to a human)
The first version of our assistant tried to be helpful in all cases. We retrained, tightened the prompts, and it’s now refusing more — with much higher trust scores from users.
5. The model is the cheap bit. The infrastructure is the work.
Per token, modern frontier models cost almost nothing. The work is everywhere else:
- Prompt-injection defence — users put weird instructions into form fields; documents have hidden text; the model has no concept of "trust this source over that one" unless you build it.
- Retrieval that’s actually good — "give it the right rows" sounds easy. In a system with 387 Eloquent models and decades of accumulated jargon, it’s most of a year of work to do well.
- Cost controls — per-tenant rate limits, per-query token caps, fallback to cheaper models for trivial queries. Without these, one chatty user costs you more than the licence fee.
- Observability — what was the prompt, what came back, what changed in the database afterwards, what did the user do next. Without this you can’t improve anything.
6. Don’t hide that it’s AI
UI signals matter. We mark every AI response with a small indicator and keep the underlying source rows expandable. Users learn to trust the assistant precisely because they can see what it based its answer on. The opposite design — making the assistant feel omniscient — backfires the first time it gets something subtly wrong.
What to take away if you’re considering this
If you’re thinking about embedding AI into a regulated product, the order of work matters:
- Write the refusal list before any prompts.
- Build the role-scoped retrieval layer next.
- Get audit-logging in before user testing.
- Then start on the prompts.
- The model choice is last, and it’ll change every six months anyway.
If your team is figuring out how to do this for a system you already run, we’ve been there. Half the value of getting it right is knowing which questions to ask up front.