Emergency Services / GovTech

AI Agents for Soflex's 911 Emergency Response

Soflex operators handled high-volume 911 traffic with paper-era protocols and inconsistent triage. We shipped a human-in-the-loop agent that classifies calls, prioritizes dispatch, and surfaces the next best action in real time.

AI Agents NLP Real-time Systems Automation
42% Less manual work
60% Faster triage
99.4% Classification accuracy
24/7 Coverage

Soflex runs the 911 emergency dispatch for one of LatAm's highest-traffic metropolitan operations. Thousands of calls per day flow through their console, each one demanding correct classification, fast prioritization, and protocol-perfect instruction — with zero margin for error.

The legacy workflow was paper-era. Operators kept laminated flowcharts at the console, made classification decisions by memory, and scored dispatch priority by gut feel at peak shift. The result was inconsistent triage between operators, delayed dispatches at traffic spikes, and a growing compliance backlog.

sesgo.ai shipped a human-in-the-loop AI agent that classifies every call, scores dispatch priority, and surfaces the relevant protocol in real time — integrated directly into the existing operator console. The system went live in six weeks and now assists every operator on every shift.

01 · The Challenge

Paper-era protocols in a real-time, life-critical environment

Soflex operators answered thousands of 911 calls per day. Every call demanded three near-simultaneous decisions — classification (medical, fire, police, domestic, hoax, other), severity (life-threatening, urgent, standard, informational), and dispatch prioritization against the current queue of open incidents. None of those decisions was supported by tooling beyond a memorized flowchart.

The problems compounded. Classification quality varied by operator, by shift, and by experience. New operators drifted against veteran operators for their first six months on the console. At peak load — commuter hours, weekends, weather events — operator cognitive load spiked and protocol adherence degraded measurably.

Staffing gaps made things worse. The operation ran continuously, but recruiting and training emergency operators takes months. Shifts with two operators short saw measurable increases in triage latency and in the false-priority-down rate — the most dangerous error class, where a real emergency gets queued behind a lower-priority incident.

The compliance context was unforgiving. Emergency dispatch operations in the region are audited against strict SLAs, with penalties for classification errors and delayed dispatches. Off-the-shelf call-center AI tooling did not come close to covering those SLAs, and none of it understood the local protocol library or the nuances of emergency-specific language.

The executive ask was framed by risk, not efficiency. The goal was to support operators with decision-grade signal in real time, cut triage time without cutting safety margin, and do it inside a six-week window before the next audit cycle.

02 · Our Approach

Human-in-the-loop by design, eval harness before production

We scoped a six-week plan with two guarantees baked in from day one. First, every agent action surfaces to the operator — the agent never acts alone. Second, operator overrides are first-class training signal — every correction becomes data that improves the next week's model.

The contract KPIs were decision latency, classification accuracy, and the false-priority-down rate. That last metric got explicit guardrail treatment. A system that was 5% faster but pushed a real emergency down the queue once was a failure, full stop. The eval harness was built before any agent saw production data.

We built the golden set first. Five hundred historical calls were pulled, stratified by type and severity, and labeled by two independent expert reviewers. Where the reviewers disagreed, a senior dispatcher adjudicated. That gold set became the regression bar — no model version shipped to production without beating the current champion on it.

  • Human-in-the-loop by default The agent never dispatches. It classifies, scores, and suggests. The operator confirms. Every override is training signal.
  • Eval harness before production A gold set of 500 expert-labeled calls, two-reviewer agreement, senior adjudication. No model ships without beating the champion on it.
  • Multi-agent orchestration A classifier agent, a dispatch-scoring agent, and a protocol-retrieval agent — each with its own eval, composed by LangGraph.
  • Shadow-then-advisor rollout Two weeks in shadow (invisible to operators), two weeks as advisor (visible, optional), then full advisor integration on the console.

The technology choice was driven by the operational constraint of protocol adherence. Whisper handled real-time transcription, Claude handled classification and severity scoring (with tight system prompts and explicit refusal behavior on ambiguous calls), and a retrieval tier ran against Soflex's own protocol library — not a model's training-time knowledge. Every suggestion the operator saw could be traced back to a specific protocol page. LangSmith handled the observability layer.

03 · The Solution

Call audio in, operator-grade decision support out

Every active call flows through the same pipeline. Audio is transcribed in real time by Whisper, streamed into a LangGraph-orchestrated agent graph, and returns a structured result to an overlay on the existing operator console. The operator never switches context. The agent never dispatches. The loop is continuous — transcription, classification, scoring, protocol retrieval, and operator confirmation all happen inside the live call window.

                         LIVE 911 CALL

   +-----------+     +----------------+     +----------------+
   |   Call    |---->|    Whisper     |---->|   Classifier   |
   |   Audio   |     |  Transcription |     |     Agent      |
   +-----------+     +----------------+     +-------+--------+
                                                    |
                                                    v
                                            +---------------+
                                            |   Dispatch    |
                                            |  Scoring Agent|
                                            +-------+-------+
                                                    |
                                                    v
                                            +---------------+
                                            |   Protocol    |
                                            | Retrieval     |
                                            | (Soflex KB)   |
                                            +-------+-------+
                                                    |
                                                    v
                                           +------------------+
                                           | Operator Console |
                                           |     Overlay      |
                                           +--------+---------+
                                                    |
                                                    v
                                           +------------------+
                                           |    Feedback      |
                                           |     Capture      |
                                           | (LangSmith)      |
                                           +------------------+
LangGraph orchestrates the three agents. Every action is observable, every correction is training data.

The classifier agent returns an incident type and a severity tier. The dispatch-scoring agent factors in severity, queue depth, unit availability, and geographic proximity to produce a priority score against the current incident queue. The protocol-retrieval agent searches Soflex's own protocol library for the matching procedure and surfaces the first action the operator should take.

Each of those agents is a standalone unit with its own tests, its own golden set, and its own observability. That isolation was deliberate — in a life-critical environment, a classification bug cannot silently corrupt the dispatch score, and a protocol retrieval miss cannot trigger a priority change. The operator sees every input and every suggestion, and the UI makes refusal frictionless.

Feedback capture is the piece that keeps the system durable. Every operator confirmation, override, and refusal streams into LangSmith. A weekly review cycle pulls disagreement cases for expert review and feeds them into the gold set, which in turn gates the next model release. The system gets smarter every week without anyone needing to touch the agent code.

04 · Results

Faster triage, consistent classification, audit passed on first attempt

42% Less manual work per call

Average operator-initiated actions per call dropped by 42%. The agent did the mechanical lookups and prefilled fields the operator previously typed.

60% Faster triage-to-dispatch

Median time from call classification to dispatch fell 60%, with the biggest gains on the highest-severity tier where every second matters.

99.4% Classification accuracy

Measured against expert-labeled samples drawn weekly. The false-priority-down rate — the most dangerous error class — trended to near zero.

0 SLA regressions at peak load

24/7 coverage through peak shifts, weather events, and weekend spikes with no measurable SLA regression versus the baseline month.

The operational impact was visible inside the first full week of advisor mode. Operator cognitive load dropped sharply — the concrete evidence was operator NPS, which rose by double digits in the first month and held. Operators reported that the agent handled the "tedious but easy" decisions, freeing attention for the conversation that actually mattered.

The compliance impact was cleaner still. The system produces a structured audit trail for every call — classification, severity, protocol referenced, operator confirmation. Soflex passed its compliance audit on the first attempt post-deployment, with auditors specifically citing the audit trail as best-in-class.

The six-week window mattered. Operations leadership had committed to audit readiness by a fixed date; the agent shipped inside that window and was the primary exhibit in the audit review. ROI on the engagement was defended on that outcome alone, with triage speed and operator productivity treated as upside.

sesgo.ai delivered an AI agent for emergency operations in six weeks. We reduced manual triage by 42% and gave our operators real-time decision support that actually works under pressure.
Operations Leadership · Soflex

Technologies deployed

  • LangChain
  • LangGraph
  • OpenAI Whisper
  • Anthropic Claude
  • Postgres
  • Redis
  • FastAPI
  • LangSmith
  • Datadog
  • AWS
  • React

Planning a similar system?

Book a 30-minute strategy call. We will stress-test the use case and share how we would approach it before you commit.