Sometimes the non-sexy components — routing logic, user permissions, human fallback — are exactly what separate a usable AI from a useless one.
I ran into this AI ethics and architecture scenario in the wild:
Let's call the org ABC and the employee Michael.
ABC had built a near zero-touch HR experience, claiming that their chatbot handled over 95% of employee inquiries. Employees were encouraged to lean on the bot; many internal policy docs were pulled from direct access and relegated to the backend, visible only to the RAG pipeline. The idea: rely on the chat interface, not the HR portal. Something I support — if done well.
Michael put in his two weeks, and things ran smoothly until he asked about PTO payout rules. The bot gave an answer that didn't match his state's labor laws. He spent time digging into the statutes, surfaced the correct text, then asked again with the relevant context. The bot stuck to its first (wrong) answer.
He escalated to HR. The HR partner gave the same incorrect answer. When Michael asked, "Is this coming from our AskHR?" the partner paused… then confirmed that it was.
What went wrong (and what we can learn):
1. User profile context — The system cited general or international policies that didn't apply locally. The correct flow is: understand this is a labor question → understand the user's country and state → restrict the search space to only those documents → then trigger the RAG workflow.
2. No clear human handoff — Confidence thresholds are tricky to implement in RAG, but this is exactly why they're worth the effort. A complex question like this should've gone straight to a human. It's better to give no answer than a wrong one.
3. Ethics & liability risk — When AI is treated like a policy source, wrong answers carry real (and legal) consequences.
In this scenario, AI didn't save time — it created work and confusion. The productivity promise broke because of architecture and design decisions, not because the LLM itself was "flawed."
Sometimes the non-sexy components — routing logic, user permissions, human fallback — are exactly what separate a usable AI implementation from a useless (and potentially dangerous) one.
So I'll leave this with a couple of questions:
In sensitive domains (HR, legal, health), how do you define what's "safe for AI" vs. what always needs a human?
If a RAG answer becomes a de facto policy, who becomes accountable when it's wrong?