All systems operational·50+ combined years · 16 verticals · UK & PL HQ
IoT-WorkS
home/blog/ai-telemetry-technical-walkthrough
Engineering15 May 2026 · 9 min read

AI Telemetry, Under the Hood: A Technical Walkthrough for Engineers

An engineer-grade walkthrough of how AI Telemetry actually works - ingestion, baseline learning, anomaly classes, edge inference on ARMv8, natural-language queries, and how 96.4% accuracy is reached without vendor-default thresholds.

IW
IoT-WorkS Editorial
last updated 15 May 2026

How does AI Telemetry actually work end-to-end?

The platform is a five-layer pipeline - sensors, edge gateways, MQTT/REST transport, the Vize ML Engine and the Vize Portal - processing more than 1 million inferences per day at an average 96.4% model accuracy (IoT-WorkS AI Telemetry, 2026). It's 100% edge-capable and UK-hosted.

[IMAGE: Five-layer pipeline diagram with sensors, edge, transport, ML core, surface - search "iot architecture diagram"]

This is a walkthrough for engineers evaluating IoT platforms, not a marketing tour. We'll cover ingestion, baseline learning, anomaly classes, edge inference, the natural-language query layer, residency, and explainability - with the specs and numbers as they exist on the AI Telemetry page today.

TL;DR: AI Telemetry ingests over 1 million inferences per day at 96.4% accuracy (IoT-WorkS, 2026). It learns per-asset baselines instead of fleet-wide thresholds, classifies point, contextual and collective anomalies, runs sub-second inference on ARMv8 edge gateways, and reports a calibrated confidence number on every prediction.

[INTERNAL-LINK: AI Telemetry product page -> /ai-telemetry/]

What does the data ingestion layer look like?

Ingestion is MQTT-first, with REST/webhook fallbacks for systems that can't speak a broker. Telemetry arrives over TLS - mTLS where the deployment requires it - and is buffered both at the gateway and the broker. The transport layer is built for store-and-forward, not best-effort.

Sample rates are configured per device class. A VS-T200 temperature/humidity sensor typically reports every 15 minutes; a VS-V100 vibration node bursts at 1 kHz during sampling windows then sleeps. The broker enforces per-topic burst budgets so a single misbehaving asset can't starve the rest of the tenant.

[PERSONAL EXPERIENCE] Most ingestion incidents we've seen aren't from packet loss - they're from clock skew. A gateway running 90 seconds fast against the broker will silently mis-bucket every reading until NTP catches up. Time discipline is non-negotiable at the edge.

Citation capsule: AI Telemetry's transport layer handles more than 1 million inferences per day across MQTT, OPC-UA, Modbus TCP and REST/webhooks, with TLS and optional mTLS, and buffered forwarding so network drops do not lose data (IoT-WorkS AI Telemetry, 2026).

[INTERNAL-LINK: hardware catalogue -> /products/]

Why do per-asset baselines beat vendor-default thresholds?

Because no two assets are identical. A 2018 reefer compressor and a 2024 reefer compressor have different normal vibration signatures, different start-up transients and different duty cycles. A fleet-wide -18°C alarm tells you nothing about this unit's drift.

AI Telemetry runs an online learner per asset, per shift, per season. The default warm-up window is six weeks - enough to capture weekday/weekend cycles and at least one seasonal turn. Anomaly detection produces useful results before that; predictive confidence intervals tighten over the warm-up.

The detector ensemble is isolation forest plus an LSTM head, with a streaming and a batch path. The streaming path flags point anomalies in milliseconds; the batch path catches slow drift the streaming model misses. False positives feed back from operators in the portal, retraining the per-asset model rather than tweaking a global threshold.

[UNIQUE INSIGHT] Vendors who sell "AI" with a single global threshold are running rules, not learning. The first question to ask any platform is "how many models do you maintain per estate?" - if the answer isn't "one per asset", you're getting a dashboard with a bow on it.

Anomaly types in production

  • Point anomalies - a temperature spike on a single reading. Caught by the streaming detector in under a second.
  • Contextual anomalies - 4°C is fine for a chiller, abnormal for a freezer. The model encodes the asset's expected operating envelope, so the deviation triggers in context.
  • Collective anomalies - a sequence of values that, individually, look fine but collectively don't. Example: a reefer compressor whose duty cycle has crept from 38% to 51% over three weeks. No single reading is alarming. The LSTM picks it up.

How does the platform predict failures with confidence intervals?

Every inference ships a calibrated confidence number - this is non-negotiable in the design. Average model accuracy across deployed estates is 96.4%, and the predictive layer reports per-event confidence (IoT-WorkS AI Telemetry, 2026). On the AI Telemetry page, a reefer compressor failure flagged seven days early sits at 94% confidence; a baseline cycling anomaly sits at 99%.

The predictive layer uses survival models - Weibull fits where the failure mode has a clear hazard curve, and Cox proportional hazards where covariates matter. Outputs are time-to-failure distributions, not point estimates. Engineers see a "fails in 5-9 days at 94% confidence" window, then make their own call on dispatch.

When a prediction crosses the alert threshold, the system can auto-create a job in the Booking Engine - skilled engineer, travel time, parts stock, SLA window. That handoff is described separately in the AI Bookings architecture deep dive.

[ORIGINAL DATA] On a UK cold-chain estate of ~400 reefer units running on AI Telemetry, the seven-day compressor warning at 94% confidence has held up across multiple repeat failures in the same fleet - the same model class, same feature attribution, same lead time.

Citation capsule: AI Telemetry reports a calibrated confidence number on every inference. A reefer compressor failure was flagged at 94% confidence seven days early, and a baseline cycling anomaly at 99% confidence, against an average estate-wide accuracy of 96.4% (IoT-WorkS AI Telemetry, 2026).

[INTERNAL-LINK: cold-chain industry context -> /industries/cold-chain/]

How does the natural-language query layer answer questions like "Which reefers in Manchester ran above -18°C for >30 min last quarter?"

By translating the question into structured SQL against a semantic layer, not by RAG-ing over PDFs. The semantic layer sits over MQTT-ingested time-series tables and the warehouse, with typed entities - asset, site, reading, event - and named measures - time_above_threshold, compressor_duty_cycle, door_open_minutes.

The NL model resolves "reefers in Manchester" to asset.type = 'reefer' AND site.region = 'Manchester', resolves "above -18°C for more than 30 minutes" to a windowed aggregation, and emits SQL the operator can inspect. Every answer cites the rows it used.

This matters for audits. An auditor doesn't accept "the AI said so". They accept "here are the 47 readings from these 12 units during these 19 windows". Both arrive on the same screen.

Queries are exposed through the Vize Portal, Slack, Teams and a REST endpoint. Confidence-low queries escalate to a human - same pattern as predictive alerts.

How does edge inference on the VG-E300 actually work?

The VG-E300 Edge ML Gateway is an ARMv8 quad-core with 4 GB RAM, running ONNX Runtime and TensorFlow Lite side by side (IoT-WorkS Products FAQ, 2026). Inference latency is sub-second on the model sizes that matter for telemetry workloads.

What fits comfortably: quantised isolation forests, gradient-boosted trees (LightGBM, XGBoost exported to ONNX), small LSTMs for sequence problems, and 1D CNNs for vibration-signature classification. Heavier transformer-style models stay in the cloud - they don't add value at the edge for time-series anomaly work.

Models reach the gateway as OTA updates with rollback. A bad model push is reverted automatically if inference latency or error rate breaches its envelope - no engineer needs to drive to a depot.

Why the platform is 100% edge-capable

Because some customers can't send raw telemetry off site. A nuclear-adjacent customer, a defence customer, a pharma customer mid-audit - they all have residency constraints that rule out cloud-first ingestion. AI Telemetry is designed so the model can run, score, alert and queue summaries entirely at the edge, with the cloud picking up aggregates when the network is available.

Citation capsule: AI Telemetry is 100% edge-capable. The VG-E300 Edge ML Gateway runs ONNX Runtime and TensorFlow Lite on an ARMv8 quad-core CPU with 4 GB RAM at sub-second inference latency, with OTA model updates and rollback (IoT-WorkS Products, 2026).

[INTERNAL-LINK: engineering capabilities -> /engineering/]

Where is the platform hosted and how does data residency work?

UK-hosted by default. EU hosting is available at the customer's choice. Customer data is never used to train cross-customer models - this is contractual, not aspirational (IoT-WorkS Engineering FAQ, 2026).

Residency is enforceable at three layers: at the edge (raw data stays on the gateway, only inferences and aggregates sync), in transport (mTLS-pinned brokers in the chosen region), and at rest (UK-region Postgres + warehouse with no cross-region replication). For GDPR-regulated estates we recommend pinning at the edge.

How does explainability work - why no black-box?

Every alert carries feature attribution - which sensor channel, which time window, which deviation contributed how much to the score. Engineers click the alert and see the underlying rows, the model's own reasoning trail, and the predicted vs observed values.

For survival models the attribution shows hazard contribution per covariate. For tree-based detectors it's SHAP. For LSTMs it's attention or integrated gradients depending on the model class. Either way, no alert lands in a queue with "the model said so" as the only justification.

This isn't optional. An engineer dispatched to swap a compressor needs to know why, not just when.

FAQ

How does AI Telemetry handle bursty MQTT senders without dropping data?

Each tenant has rate-limited subscriptions with per-topic burst budgets. Above budget, the broker buffers to disk and applies back-pressure rather than dropping packets. Gateways store-and-forward locally on disconnect, so a 6-hour 4G blackout reconciles cleanly when the network returns.

How long is the baseline warm-up window per asset?

A six-week warm-up is the default before predictive alerts go live. Anomaly detection produces results earlier than that, but the predictive confidence intervals only tighten once each asset has covered its full operating cycles - weekday shifts, weekends, and at least one seasonal turn.

Why do vendor-default thresholds fail in production?

They treat a fleet as homogeneous. A 2018 reefer and a 2024 reefer have different normal vibration signatures and duty cycles. AI Telemetry learns a baseline per asset, per shift and per season, and surfaces deviations the fleet-wide threshold misses entirely. Average estate accuracy lands at 96.4%.

What models actually fit on the VG-E300 edge gateway?

Quantised isolation forests, gradient-boosted trees, small LSTMs and 1D CNNs all run comfortably on the ARMv8 quad-core with 4 GB RAM. ONNX Runtime and TensorFlow Lite both ship on the device, with sub-second inference latency and OTA model updates including automatic rollback (IoT-WorkS, 2026).

Are alerts explainable or is this a black-box?

Every alert ships with feature attribution - which sensor channel, which time window, which deviation contributed how much. Engineers drill into the underlying rows from the alert itself. SHAP for trees, hazard contributions for survival models, attention or integrated gradients for sequence models. No black-box scoring.

[INTERNAL-LINK: predictive maintenance context -> /blog/iot-predictive-maintenance-trends-2024/]

Where to take this next

If you're evaluating AI Telemetry against an existing IoT stack, the questions worth asking your incumbent are: how many models do you maintain per estate, what's the warm-up window, what's the confidence number on each inference, and what runs offline at the edge. Those four answers separate a learning system from a dashboard with rules.

For a deeper read on the sibling system that takes a predictive alert and turns it into a dispatched engineer, see the AI Bookings architecture deep dive. For predictive-maintenance context across industries, the predictive maintenance trends overview is the right starting point. For the canonical product spec sheet, the AI Telemetry page is kept current.

If you want a deployment review against a real estate, contact engineering.

/ next step

Tell us what you’re trying to monitor.

A UK-based engineer will reply within one working day with a recommended sensor mix and platform fit.

Continue reading

all articles →