Why We Built an AI-First Chat Widget

Most live chat tools are 2010s helpdesks with an AI bot pinned to the side.

The bot is the upsell. The chat still pings a human first, the AI assists from the corner, and you pay extra for the privilege. Five years ago that made sense — the bots weren't good enough to be the first responder. In 2026 they are. Bolting them on as an add-on is leaving the most important architectural decision on the table.

Agentbot is built the other way around. The AI is the first responder, every time, and a human is the escape hatch. This post is about what that actually changed.

The bolted-on AI problem

When you install Intercom's Fin, Tawk's AI Assist, or Crisp's chatbot, you're paying for an AI layer that sits on top of a human-led inbox. The pricing reveals the architecture: per-resolution charges, separate AI tiers, "Lyro conversations" metered alongside "billable conversations". The AI is sold as a feature because it was bolted onto a product that already existed.

That bolt-on shape leaks into the UX. The bot greets the visitor with a generic "How can I help?", the visitor types, the bot deflects to an article link, the visitor gets frustrated, a human eventually picks it up. Conversion goes to zero somewhere in there. The bot existed to fill the gap between the visitor and the human — not to actually resolve the question.

What "AI-first" architecture looks like

A few things change when the AI is the default, not the upsell.

Reads the page the visitor is on. The bolt-on bots don't know which URL the visitor came from. The page-aware retrieval (we send the URL with every conversation context) lets the AI answer "is this in stock?" on a product page without asking "which product?".

Pulls from your knowledge base on every turn, not on every "ask". RAG over uploaded docs runs by default. The visitor doesn't ask "search the docs" — the AI just answers from them. We use the always-retrieve pattern Intercom Fin and Perplexity production both use; tool-call retrieval was unreliable (Gemini Flash routinely skipped it when the question's surface text looked off-topic).

Auto-resolves and closes. When the visitor says "thanks, that's all I needed", the AI calls mark_resolved and the conversation auto-closes after a 2-hour grace window. The grace gives the visitor time to come back with a follow-up. Without it the inbox fills with conversations no one is going to act on.

Files tickets for async issues. Bug reports, refunds, account questions. The AI files a ticket with the full transcript attached and the team picks it up cold. The visitor doesn't have to wait in a chat window for an answer that's going to take 4 hours anyway.

Hands off cleanly when it can't help. The AI flags needs_human and pings the team via web push. The visitor sees "I'm getting someone to help" instead of a dead chat. The AI stays available to help while the team picks it up.

Why we chose OpenRouter

We route through OpenRouter instead of going direct to one model provider. Three reasons:

Model switching is one env variable. We've shipped on Gemini 2.0 Flash, GPT-4o-mini, and Claude Sonnet variants depending on cost-quality trade-offs at any given time. OpenRouter abstracts the API layer.
Failover. Provider 5xx and rate limits are real. OpenRouter retries across providers if the primary chokes.
Cost visibility. Per-request cost reporting lets us understand the unit economics of the free tier honestly. We can't sell "free during beta" if we don't know what each conversation costs us.

What we'd do differently

Two regrets worth being public about.

We initially built KB search as a tool call. The model would call kb_search when it decided the question needed external lookup. Gemini Flash routinely skipped this on questions whose surface text didn't look "on-topic" — ISO 27001 leadership commitments looked like an off-topic essay, so the model declined without searching the KB that had the answer. We migrated to always-retrieve (Intercom Fin's pattern) and the failure mode went away. Should have started there.

We metered the visitor's current_url separately from the entry-point URL. Originally only the entry-point URL flowed to the AI. Visitors who navigated to a different page mid-conversation got answers about the page they'd left. Adding current_url (updated on every message) fixed it, but the migration was uglier than it should have been because the original schema bound a single URL to the conversation. Should have decoupled from day one.

What's next

The next big lift is the write API — letting external systems post messages into a conversation, not just receive them. That unlocks proper Slack two-way conversation, Discord bot replies, CRM-driven outreach. Read-side webhooks ship today; write-side is the obvious next surface.

Beyond that: i18n on the widget UI (the AI already handles any language; the chrome strings are English-only), native iOS / Android apps for agents who want push without a browser, and the first real paid tier when the beta wraps.

If any of this is interesting to you, drop the script tag on your site and try it. The whole product is free during the open beta — no credit card, no agent cap, the full surface unlocked.