Which language model runs the intent extractor?

A fine-tuned transformer wrapped behind a thin schema layer. The model is swappable per deployment, and the schema layer enforces a strict JSON shape so downstream stages never see free-form text. Confidence is reported per field, not per message.

How is the confidence threshold for escalation set?

Thresholds are configured per booking type. A safety-critical install starts conservative (≈0.85), a routine annual service relaxes lower (≈0.65). Each escalation is logged so the threshold can be retuned monthly against real outcomes.

What happens if the scheduler can't satisfy every constraint?

The solver returns the highest-scoring feasible slot and surfaces which constraints were relaxed - travel time over target, utilisation below floor, parts arriving day-of. The reply to the customer is always feasible; the slack is visible to dispatch.

How are duplicate bookings across channels deduplicated?

Inbound messages are hashed against an entity-resolution layer using customer ID, asset reference, site postcode and a 24-hour window. A WhatsApp follow-up to an email enquiry threads into the same booking, not a new one.

Can the engine integrate with an existing field-service management system?

Yes - the scheduler emits booking events over webhook, REST and MQTT. Most deployments use it as a front door in front of an existing FSM (ServiceNow, Salesforce Field Service, Dynamics 365), pushing only the confirmed slot downstream.

What is logged for audit and rollback?

Every action - inbound message, extracted intent, candidate slots, chosen slot, reply text, escalation reason - is written to an immutable audit trail. Any booking can be reversed and the reason annotated; the system never mutates history.

AI Bookings Architecture Deep-Dive | IoT-WorkS

Anyone selling "AI scheduling" can show you a chatbot. What's harder, and far more interesting, is the pipeline behind the chatbot - the bit that turns a one-line WhatsApp message into a slot on an engineer's calendar with the right parts in the van. This post walks through that pipeline as it's implemented on AI Bookings, end to end, with no marketing varnish. If you're an engineering lead evaluating booking automation, this is the level you should be asking vendors to explain.

TL;DR: AI Bookings is a five-stage pipeline - Detect, Extract, Schedule, Confirm, Escalate - that auto-handles 73% of booking enquiries end-to-end with a 14-second average first-reply time and four supported languages, per the iot-works.com/ai-bookings live page. The scheduler is a constraint solver, not a chatbot. Confidence thresholds are configurable per booking type, and every action is logged and reversible.

[IMAGE: Five-stage AI bookings pipeline diagram with channels feeding into Detect, Extract, Schedule, Confirm and Escalate boxes - search "pipeline architecture diagram"]

[INTERNAL-LINK: AI Bookings product page -> /ai-bookings/]

What does the AI Bookings engine actually do?

AI Bookings is an automated service-appointment scheduler with five pipeline stages: Detect, Extract, Schedule, Confirm and Escalate. On a representative deployment it auto-handles 73% of inbound bookings, with a 14-second average first-reply time and four supported languages (iot-works.com/ai-bookings, 2026).

The engine isn't a single model. It's a chain of small, observable components - a router, an extractor, a solver, a templater and a confidence gate - each of which can be swapped, monitored and rolled back independently. That separation is what lets the system stay explainable when it's wrong.

Citation capsule: AI Bookings is a five-stage pipeline (Detect, Extract, Schedule, Confirm, Escalate) that, on a representative IoT-WorkS deployment, auto-handles 73% of inbound booking enquiries and replies in an average of 14 seconds across four supported languages (iot-works.com/ai-bookings, 2026).

Stage 1 - How does Detect ingest bookings from four channels?

Detect is the channel-layer router. Each supported channel - email, WhatsApp, web form, voice - has a dedicated webhook that fires on inbound, normalises the payload to a common envelope, and writes it to an idempotent inbox before any model runs. Channel inputs are kept to four, deliberately, and matched to the live AI Bookings page.

Email

Inbound mail lands via IMAP polling or an SMTP relay. The mail body is stripped of signatures and quoted history; attachments are detached and stored separately for the extractor to reference.

A Meta Cloud API webhook delivers messages as JSON. Voice notes get a transcript pass through a speech-to-text model before extraction. Threading uses the conversation ID, not the phone number, so two contacts at the same site stay separate.

Web forms

The marketing site posts structured JSON straight to the inbox. Form fields populate intent slots directly, so extraction work is light. Free-text "notes" boxes are still parsed by the extractor.

Voice

Inbound calls hit a SIP trunk, are transcribed in near-real-time, and either handled by an interactive voice agent or summarised into a written enquiry. The voice path is the slowest of the four; the others routinely clear Detect in under a second.

[CHART: Bar chart of mean detect-to-inbox latency by channel - email, WhatsApp, web form, voice - source: internal benchmark]

[INTERNAL-LINK: AI Telemetry overview -> /ai-telemetry/]

Stage 2 - What does the Extract stage pull out of a message?

Extract is where a language model does its single, narrow job: turn raw text into a strict JSON object the rest of the pipeline can reason about. On the live page the step is summarised as "LLM extracts intent, entities, deadlines, location, asset references" (iot-works.com/ai-bookings, 2026).

The extractor runs a fine-tuned transformer behind a schema validator. The model returns intent, urgency, site, asset reference, customer-provided deadline, contact channel preference and a per-field confidence score. If the JSON fails validation, the message gets retried once with a stricter prompt and then escalated - we never let malformed structure leak downstream.

[ORIGINAL DATA] Across the deployments we operate, the most common extractor failure mode isn't bad grammar or accents - it's customers referring to assets by nickname ("the big freezer at the back") instead of a serial or asset ID. Roughly one in eight enquiries needs a clarifying reply before the extractor's site/asset confidence clears threshold.

Citation capsule: The Extract stage runs an LLM that returns intent, entities, deadlines, location and asset references as a strict JSON object with per-field confidence, matching the canonical pipeline description on iot-works.com/ai-bookings (2026).

Stage 3 - How does the Schedule stage pick a slot?

Schedule is a constraint solver, not a model. It receives the extracted intent and asks: across every engineer-day in the candidate window, which combinations satisfy the hard constraints and maximise the soft ones? It then ranks the survivors and picks the highest-scoring slot. The solver weighs nine constraints, all listed on the live AI Bookings playbook.

The nine scheduling constraints

Engineer skills - match job type to certified engineer.
Travel time - live traffic, fuel, hours-of-service.
Parts stock - van, regional depot, supplier lead time.
SLA windows - customer contract, severity.
Customer preference - site access windows, contacts.
Weather - outdoor work risk-adjusted.
Depot capacity - bay slots and overnight returns.
Utilisation - target 80-90%, no churn-burn.
Contract commitments - active service-level obligations tied to the customer.

Engineer skills, SLA windows and parts stock are treated as hard constraints by default - the solver will not return a slot that violates them. The other six act as weighted soft constraints, with weights configurable per customer.

[UNIQUE INSIGHT] Most "AI scheduling" tools we benchmark against treat utilisation as a hard target. That's a mistake. If you force 90% utilisation as a constraint, the solver starts churning engineers across the country to hit the number. Utilisation has to be soft, with a floor and a ceiling, or the system optimises for the wrong metric.

[INTERNAL-LINK: Engineering services -> /engineering/]

Stage 4 - How does Confirm close the loop with the customer?

Confirm is the reply generator. It composes an auto-reply in the customer's language, sends calendar invites, reserves parts in the warehouse system and writes the booking to the FSM. On the live page the step reads: "Auto-reply in the customer's language. Calendar invites sent. Stock reserved" (iot-works.com/ai-bookings, 2026).

The reply uses templated language with model-generated slots - never free-form generation - so the wording stays consistent and legally reviewable. Stock reservations and calendar writes are wrapped in a transactional outbox: if any of the three downstream systems fails, the whole booking rolls back and the customer never gets a confirmation we can't honour.

[PERSONAL EXPERIENCE] We learned this the hard way. An early build sent the confirmation email before the calendar write succeeded. When the calendar API blipped for forty seconds one Tuesday morning, twenty-three customers got confirmations for slots that didn't exist. The outbox pattern was a same-day fix and has held since.

Stage 5 - When does the engine escalate to a human?

Escalate is the safety net. When any field's confidence sits below the configured threshold, the engine hands the booking to a human dispatcher with the full conversation, the extracted intent, the candidate slots and a recommended action. From the live page: "Below-threshold confidence -> human takes over. Every action explainable & reversible" (iot-works.com/ai-bookings, 2026).

Confidence-threshold tuning

Thresholds are per booking type, not global. A new-install booking for a hospital pharmacy carries a tighter cutoff than a routine annual service for a known site. Each escalation writes an audit row with the triggering field and its score, and operators retune monthly against the outcomes of borderline cases.

In practice, the threshold is the single most important knob in the system. Set it too high and the escalation queue floods; set it too low and the customer experience degrades. We start every deployment in shadow mode - the engine drafts, a human approves - for the first two weeks, then move to live with conservative thresholds.

A worked example: a real-shape booking enquiry

A customer sends a WhatsApp message at 07:42 on a Monday: "Hi - one of our walk-in freezers at the Newcastle DC has been alarming overnight on temperature. Can someone come out this week? Mike."

The extractor returns this object:

{
  "intent": "reactive_service_visit",
  "urgency": "high",
  "site": "Newcastle DC",
  "asset_ref": null,
  "asset_description": "walk-in freezer",
  "symptom": "overnight high-temperature alarm",
  "deadline": "within 5 working days",
  "preferred_channel": "whatsapp",
  "confidence": { "intent": 0.97, "site": 0.93, "asset_ref": 0.31 }
}

The asset_ref confidence is below threshold, so the engine sends a one-line clarifier asking which freezer asset tag, while the solver pre-computes candidate slots in parallel. Once Mike replies "FRZ-NC-04", the solver picks Wednesday 10:00-12:00 with engineer EJ-217: she holds the refrigeration cert (hard constraint #1), Newcastle DC is 38 minutes from her 09:00 job (soft #2 satisfied), the relevant condenser fan is on her van (hard #3), and the slot lands inside the 5-working-day SLA (hard #4). Total elapsed time from inbound to confirmed slot: 9 minutes, most of it waiting on Mike's reply.

[CHART: Timeline of the worked example showing inbox, extraction, clarifier, customer reply, scheduling, confirmation - source: internal trace]

Where does the engine still get confused?

Two failure modes dominate. The first is ambiguous customer wording: nicknames for assets, vague timeframes ("sometime next week-ish"), and conflicting deadlines in the same message. The extractor catches these via per-field confidence and asks a clarifier, but every clarifier costs latency and one in twenty customers don't reply.

The second is calendar collisions in legacy FSM systems. When the downstream calendar truth-source is a 12-year-old database that allows double-bookings via a back-office UI, the engine can pick a slot the solver believed was free. We mitigate by treating the FSM as the source of truth, polling on a tight cadence, and treating any write conflict as an automatic escalation - but a clean integration always outperforms a defensive one.

[UNIQUE INSIGHT] Ambiguity isn't a model problem; it's a product problem. The answer is rarely a bigger LLM - it's a better clarifier policy and a shorter, more disciplined inbound form. We've cut clarifier rates by a third on two deployments just by adding an "asset tag" field to the inbound web form.

FAQ

The Q&A above (in the post frontmatter) covers the architecture-specific questions. For commercial questions - pricing, channel coverage, escalation behaviour at a product level - see the canonical FAQ on the AI Bookings page and the case studies.

Wrap-up

The interesting part of "AI booking" isn't the AI - it's the boring scaffolding around it: the channel router, the schema validator, the constraint solver, the confidence gate, the transactional outbox, the audit trail. Get those right and the model is almost interchangeable. Get them wrong and no model will save you.

If you're scoping a booking-automation project, the questions to ask any vendor are simple: which channels, which constraints, how is confidence tuned, what's escalated, what's logged, what's reversible? Those five questions separate a real engine from a chatbot with good marketing.

Talk to engineering if you want to see the pipeline running against your own enquiry volume, or read the case studies for deployment-shape examples.

[INTERNAL-LINK: Case studies index -> /case-studies/]

How the AI Bookings Engine Actually Works: A Technical Architecture Deep-Dive