Buy PDF £9.99
Fractional CAIO · Executive Playbook Series · V-AIM Framework
The AI Incident Command Playbook
From Detection to Board Report: The V-AIM Framework for Commanding AI Failures at Scale
Author
James A Lang
Role
Fractional CAIO · Velinor
Edition
2026 · V2
Format
49-Page Playbook + 6 Templates + 3 Case Studies

Contents

  1. Introduction: Why AI Incidents Are Different
  2. Part 1 — Stage 1: Prepare
  3. Part 2 — Stages 2 & 3: Detect & Contain
  4. Part 3 — Stages 4 & 5: Govern & Recover
  5. Part 4 — Stage 6: Learn
  6. Part 5 — Leadership Metrics Guide
  7. Case Studies
  8. Appendices: Editable Templates

Introduction: Why AI Incidents Are Different

When a server goes down, the failure is visible, bounded, and recoverable. When an AI system fails, the failure may be silent, diffuse, ambiguous, and reputationally catastrophic. This playbook exists because organisations that apply standard IT incident response thinking to AI will be consistently underprepared.

AI systems occupy a different kind of risk position in your organisation. They don't just fail to work, they work incorrectly, often at scale, often undetected, and often in ways that can't simply be rolled back. Three characteristics make AI incidents qualitatively different from all other technology failures:

CharacteristicTraditional IT FailureAI System Failure
Detection Usually immediate, system goes down, alert fires Often delayed, system keeps running, producing wrong outputs nobody notices
Blast radius Bounded, the system stops, impact halts Accumulating, an AI making 10,000 decisions/day creates 10,000 harms before detection
Accountability Clear, the system failed Ambiguous, the model, the data, the deployment, the humans, the vendor?
Reversibility High, restore from backup, restart the service Low, retrospective decisions can't be un-made; affected parties can't be un-harmed
Regulation General IT standards and GDPR EU AI Act Article 73, sector-specific AI regulations, GDPR, FCA, ICO
⚠ Regulatory Context, EU AI Act (2025)
The EU AI Act became enforceable in February 2025. Article 73 requires providers of high-risk AI systems to report serious incidents, including near-misses, to market surveillance authorities. Non-compliance carries penalties of up to €30M or 6% of global annual turnover. If your organisation deploys or uses high-risk AI, incident response is no longer optional governance hygiene. It is a legal obligation.

AI incidents fall into three categories under the R3AI Standard: Reliability failures (the system doesn't do what it was designed to do), Resilience failures (the system can't maintain performance under real-world conditions), and Responsibility failures (the system causes harm, bias, or ethical damage). Understanding which category an incident falls into shapes how you contain, govern, and learn from it.

This playbook gives leadership teams the V-AIM command framework to lead their response. It covers the six stages of V-AIM — Prepare, Detect, Contain, Govern, Recover, Learn — and the First 24 Hours timeline, regulatory obligations, communication sequencing, the AI TRACE post-incident review, and the Leadership Metrics Guide. Six editable templates in the appendices can be adapted for your organisation immediately.

A note on use: this playbook is written for executive leadership teams, the people who need to govern an AI incident, not just technically resolve it. It is deliberately jargon-light and action-oriented. Technical teams will need their own supplementary runbooks; this document governs the leadership layer that sits above them.


1

Stage 1: Prepare

The single most important thing you can do for AI incident response is prepare before anything goes wrong. Organisations that define their severity classifications, response team roles, and communication templates in advance will respond in hours. Those that start from scratch during an incident will respond in days, by which time the damage has compounded.

1.1 V-SEV Classification Matrix

Not every AI problem is a crisis. The V-SEV scale provides a consistent severity classification language across technical and non-technical teams. Use this matrix to classify incidents consistently so everyone on the command team is working from the same understanding of urgency and expected response times.

V-SEV LevelDefinitionExamplesResponse SLAEscalation
V5 — SYSTEMIC TRUST EVENT Active harm occurring, regulatory breach, or material public exposure. Widespread impact on trust, operations, or reputation. Deepfake fraud in progress; AI producing dangerous medical or financial advice; PII breach via AI system; AI weaponised against customers; systemic bias at scale Immediate response. Incident Lead on-call within 1 hour. Board notification within 4 hours. Incident Lead → Executive Sponsor → CEO → Legal → Board
V4 — CRITICAL Significant business or reputational impact; regulatory exposure likely; material customer harm. Biased model discovered affecting hiring or lending decisions; financial miscalculations in AI-assisted reporting; identity verification bypass at scale; governance failure with external consequences Response team assembled within 4 hours Incident Lead → Executive Sponsor → Legal
V3 — SIGNIFICANT Operational disruption with limited external impact; quality degradation affecting business processes; regulatory interest possible. AI system goes offline; automation pipeline failures causing delays; model drift reducing output quality below threshold; localised bias detected Response within 24 hours Technical Containment Lead → Incident Lead
V2 — MODERATE Repeated errors or anomalies with limited impact; internal review required; no external regulatory trigger. Gradual accuracy decline affecting outputs; unusual patterns flagged by monitoring; minor data quality issues in production; edge-case model failures Response within 48 hours Technical Containment Lead
V1 — IRREGULARITY Minor deviation; no external or customer impact; detected through routine monitoring. Informational only. Single anomalous output; testing environment issues; minor configuration drift; low-confidence flag from monitoring system Review within 72 hours. Log and monitor. Technical Containment Lead (log only)
→ Important
When in doubt, classify up. It is better to mobilise a full V5 response for what turns out to be a V4 than to under-resource a genuine crisis. Downgrading is always easier than escalating late.

1.2 V-AIM Command Roles

Define and assign these six V-AIM command roles before any incident occurs. Each role must have a named primary and a named backup. These are decision-making roles with clear authority — not just communication functions.

V-AIM RoleResponsibilitiesTypically Held By
Executive Sponsor Ultimate accountable executive. Approves significant decisions on containment, disclosure, and recovery. Single board-to-response-team interface. Signs off regulatory notifications. CEO, COO, or Board-designated executive
Incident Lead Operational command of the incident. Coordinates all six V-AIM command roles. Makes the call on containment actions. Single escalation point for the response team. Produces status updates for the Executive Sponsor. CAIO, CTO, or senior executive designated in advance
Technical Containment Lead Owns the technical investigation. Responsible for root cause analysis, containment actions, model rollback, evidence preservation, and technical recovery validation. Head of AI/ML, Principal Engineer, or Lead Data Scientist
Legal & Compliance Lead Assesses regulatory notification obligations (EU AI Act Article 73, GDPR, FCA, sector-specific). Reviews all external communications before release. Manages evidence chain and liability considerations. General Counsel, Chief Compliance Officer, or external AI law firm on retainer
Communications Lead Owns all internal and external messaging. Controls communication timing and sequencing. Prepares and adapts templates. Approves all final language before distribution. Manages media if required. Head of Communications, CMO, or external PR lead
Business Owner Represents the business function(s) affected. Quantifies customer and operational impact. Manages customer service escalations. Tracks affected accounts, decisions, or transactions. Confirms when operational recovery is complete. COO, Customer Success Director, or Operations Head

1.3 The 12 Non-Negotiables

Before your organisation is ever in an AI incident, the following 12 conditions must already be satisfied. These are the Non-Negotiables — the minimum viable governance foundation for effective V-AIM command. If any of these are missing, your incident response capability is incomplete, regardless of the quality of your technical infrastructure.

  • AI system inventory is documented: what systems exist, what decisions they make, what data they use, and who owns them
  • Model versioning is established: you can identify precisely which model version was running at any point in time
  • Rollback capability exists: you can revert an AI system to a previous version within hours
  • Data lineage is traceable: you can identify what data trained the model and what data is currently feeding it
  • Incident classification criteria are agreed in writing and shared with the response team
  • Response team roles are assigned with named primary and backup contacts
  • Contact list is current (including Legal Counsel and external PR, if relevant)
  • Communication templates are pre-approved (see Appendices B–D)
  • Regulatory notification obligations are mapped: which systems are high-risk under the EU AI Act, what GDPR obligations apply, what sector-specific rules apply
  • Board escalation threshold is defined in writing: at what severity level and time delay does the board get informed?
  • Incident response tabletop exercise has been conducted in the last 12 months
  • External AI law firm or regulatory adviser is identified and on retainer or accessible

2

Stages 2 & 3: Detect & Contain

The first 24 hours of an AI incident are the most consequential. Decisions made in this window determine whether the situation is contained or whether it compounds. Speed matters, but not more than the quality of decisions. Acting too fast without understanding the blast radius can make things worse.

Hour 0–4: Detection & Initial Assessment

1
Confirm the incident

Not every anomaly is an incident. Before triggering the full response, confirm:

  • Is this a genuine AI system failure, or expected system behaviour?
  • What is the appropriate severity classification using the matrix in Section 1.1?
  • Is harm currently occurring (active), or has it already occurred (retrospective)?
  • What is the discovery source, monitoring alert, user report, customer complaint, external media?
2
Declare the incident and assign the Incident Lead

Use the Incident Declaration Template (Appendix A). The act of formal declaration:

  • Starts the regulatory clock for any notification obligations
  • Creates a legally defensible record of when the organisation became aware
  • Triggers the response protocol, everyone knows their role
3
Assemble the response team
  • V5/V4: Full command team, immediately. All six V-AIM roles activated.
  • V3/V2: Core team (Incident Lead, Technical Containment Lead, Legal & Compliance Lead) within 2 hours.
  • V1/V2: Technical Containment Lead plus one other as required.
4
Make the containment decision

The first and most important question: Should we switch this AI system off?

Containment Decision Framework

If active harm is currently occurring → Shut down the AI system immediately. The cost of a false positive (unnecessary downtime) is always less than the cost of continued harm.

If harm is retrospective (already happened) → Preserve all evidence before any remediation. Do not restart the system until root cause is understood.

If the situation is ambiguous → Shut down and investigate. Resuming a broken system creates liability; prudent precaution does not.

If shutting down would cause greater harm → For example, a medical or safety-critical AI where shutdown creates patient risk. In this case, escalate immediately and engage Legal Counsel before any action.

5
Preserve evidence, before any remediation

This step is non-negotiable and must precede any technical fix. Legal and regulatory proceedings will depend on evidence integrity.

  • Capture system logs, model version identifier, and input/output samples from the incident window
  • Screenshot monitoring dashboards, alerts, and any anomaly indicators
  • Preserve all communications related to the incident (internal messages, user reports, support tickets)
  • Do not alter any data sources, model configurations, or system settings until evidence is preserved
  • Assign a named custodian responsible for evidence preservation and chain of custody

Hour 4–8: Escalation & Regulatory Assessment

Internal escalation

  • Brief the CEO and/or COO for V5 and V4 incidents
  • Do not brief the full board yet, premature board escalation adds noise without clarity. Board notification should follow the threshold defined in your pre-agreed protocol.
  • Establish a situation room cadence: regular check-ins (every 2 hours for V5, every 4 hours for V4)

Regulatory notification assessment

Work with Legal Counsel to determine notification obligations. Do not wait for this assessment to conclude before taking containment action, but begin it in parallel.

RegulationTriggerDeadlineAction Required
EU AI Act (Art. 73) Serious incident involving a high-risk AI system, including near-misses Immediately upon becoming aware; follow-up report within timeline set by authority Notify relevant market surveillance authority; preserve incident records
GDPR / UK GDPR Personal data involved in breach caused or exacerbated by AI system 72 hours from awareness to regulator; without undue delay to data subjects if high risk Notify ICO (UK) / supervisory authority (EU); notify affected individuals if required
FCA (Financial Services) Material operational incident affecting regulated services; AI-related consumer harm As soon as practicable Notify FCA; document incident and response; consider consumer redress obligations
Sector-specific Healthcare, critical infrastructure, defence, etc. Varies by sector and jurisdiction Review sector-specific AI guidance; engage sector regulator
⛔ Common Error
Organisations frequently delay regulatory notification while waiting to fully understand the incident. Regulators treat this as aggravating conduct. Notify early with what you know, and update as your understanding develops. A timely incomplete notification is always better than a late complete one.

Hour 8–24: Containment Actions

By this point, the immediate crisis is stabilised. The focus shifts from containment to understanding. Root cause investigation must be structured, not anecdotal.

The four investigation questions

Every AI incident investigation must answer these four questions before remediation can be authorised:

#QuestionWhy it matters
1 What failed? (The model, the data, the deployment, the process, or the humans?) Determines the remediation approach, a model fix is different from a process fix
2 When did it start? (Point of failure vs. point of discovery) Defines the blast radius, how many decisions were affected before detection
3 What was the blast radius? (How many decisions, transactions, or people were affected?) Drives notification obligations, customer redress, and reputational risk assessment
4 Why did our controls not catch this? (What monitoring, testing, or governance mechanism failed?) The answer drives the post-incident governance improvement, without this, you will face the same incident again

Document everything

Create and maintain an incident timeline throughout, every action, every decision, every communication, timestamped. This serves three purposes: legal defensibility, post-incident learning, and board/regulator reporting. Assign a named scribe to maintain the timeline in real time.


3

Stages 4 & 5: Govern & Recover

Communication Rules

How an organisation communicates during an AI incident often matters as much as what it actually does technically. Three rules govern effective crisis communication:

Rule 1
One voice, one message. All external communications go through a single approved spokesperson, using pre-approved language. Multiple voices create contradictions. Contradictions destroy trust.
Rule 2
Acknowledge before you explain. Organisations that lead with technical explanations before acknowledging impact destroy trust. Your first communication to affected parties is not a technical briefing, it is an acknowledgement that something went wrong and that you are taking it seriously.
Rule 3
Say what you know, say what you don't know, say what you're doing about it. Uncertainty is acceptable. Silence, vagueness, and over-reassurance are not. Stakeholders can tolerate "we are still investigating", they cannot tolerate "everything is fine" followed by a larger revelation.

Communication Sequence

Internal Sequence

As soon as classification is confirmed (Day 1)
Leadership team briefing

CEO, COO, CFO, and relevant department heads. Focus on classification, what is known, immediate actions taken, and the response plan.

Day 1–2
Affected department heads

Brief those whose teams may be affected or who will be involved in response. Give clear instructions on what to do and what not to say.

Day 2–3 (only if required)
All-company communication

Only if the incident becomes public or if staff will encounter customer questions. Must be coordinated with external communications to ensure consistent messaging.

External Sequence

Before press, always
Directly affected customers / users

No customer should learn about an incident affecting them from the media. Personalised communication (using Appendix C template) to affected parties takes priority over all other external communications.

Per notification obligation deadline
Regulators

File notifications per regulatory requirements (see Part 2). Early, incomplete notification is preferred over late, complete notification.

Only if story is likely to become public
Public statement

Do not proactively publish a public statement unless the incident is already public or likely to become so. Unnecessary disclosure creates additional reputational risk without corresponding benefit.

Reactive, always prepared in advance
Media response

Prepare a media holding statement from the moment the incident is classified V5 or V4. Do not wait for a journalist to call. Use the template in Appendix D.

Recovery & Restoration Criteria

One of the most common errors in AI incident response is restoring the system before the failure is truly understood. The pressure to restore service is real, but the cost of a second failure, or of restoring a system that continues to cause harm, is far higher than continued downtime.

⛔ Do Not Restore Until
Do not restore an AI system to production until you can answer YES to every item on the following checklist.
  • Root cause has been identified and documented in writing
  • The failure mode cannot recur in the restored configuration, this has been technically verified, not assumed
  • A rollback plan exists if the restored system fails again
  • Monitoring and alerting has been specifically enhanced to detect recurrence of this failure mode
  • Legal Counsel has reviewed restoration for any regulatory implications
  • The Incident Lead has formally signed off on restoration in writing
  • Affected parties have been notified before service is silently restored (where applicable)
  • A post-restoration monitoring period has been defined (minimum 24 hours heightened monitoring recommended)

4

Stage 6: Learn

The post-incident phase is where most organisations fail. The immediate crisis passes, pressure eases, and the hard work of systemic improvement gets deprioritised. Organisations that treat post-incident reviews as genuine learning exercises, rather than box-ticking, are the ones that do not face the same incident twice.

AI TRACE Post-Incident Review

Conduct within 5–10 business days of incident resolution, while details are fresh. The full review template is in Appendix E.

→ The AI TRACE Methodology
AI TRACE structures every post-incident review around five dimensions: Trust (what was the impact on stakeholder and public trust?), Root Cause (what was the systemic origin of the failure?), Accountability (who was accountable, and were those accountabilities clear before the incident?), Correction (what specific actions will prevent recurrence?), and Evolution (what governance or capability improvement does this incident require at the organisational level?). A review that answers all five produces durable learning, not just incident closure.

Who should attend

  • Full incident response team (all roles)
  • AI system owners and operators implicated in the incident
  • A representative from risk, compliance, or internal audit
  • A facilitator who was not directly involved in the incident response (for objectivity)

The five root cause questions

Use the Five Whys technique: for each proposed cause, ask "Why did this happen?" until you reach the systemic root, not just the proximate trigger.

Root cause categoryDiagnostic questions
Model failureDid the model perform outside its design envelope? Was there evidence of drift? Was the model trained on representative data?
Data qualityWas the training data representative, current, and unbiased? Were there data pipeline failures? Was there a distribution shift between training and production data?
Deployment errorWas there a misconfiguration, version mismatch, or integration failure at deployment? Were deployment checks adequate?
Process failureDid humans rely on AI outputs without appropriate verification? Were escalation procedures followed? Were governance controls bypassed?
Adversarial attackWas the AI system deliberately manipulated? Are there indicators of prompt injection, data poisoning, or model inversion?
Governance gapDid adequate pre-deployment testing, monitoring, or human oversight exist? Was accountability clearly assigned?

What good post-incident governance looks like

  • A written action plan with named owners and deadlines, not recommendations
  • A named executive accountable for each action being completed
  • A follow-up review (30 days later) to confirm actions are being implemented
  • A summary finding shared with the board as part of the board incident report
  • The finding incorporated into future AI risk assessments and AI governance reviews

The Board Incident Report

The board needs to know about significant AI incidents, but they do not need to understand them technically. A board report on an AI incident should inform, not overwhelm.

What a good board AI incident report looks like

CriterionGoodCommon error
Length1–2 pages maximum10-page technical briefing with model performance charts
LanguagePlain English; no AI jargonTechnical terminology that directors cannot assess
FocusImpact, response, and preventionDetailed technical root cause analysis
AccountabilityClear owner for each remediation actionDiffuse accountability, "the team is working on it"
ToneFactual; appropriate seriousnessDefensive; minimising the incident or over-reassuring

The full editable Board Incident Report template is in Appendix F.

⚠ What Boards Should Not Receive
  • Technical logs, model performance data, or confusion matrices
  • Individual blame attribution, the board reviews systemic accountability, not personnel decisions
  • Speculation about causes not yet confirmed
  • An account that minimises the incident relative to what has been or will be reported to regulators

Case Studies

The following three case studies apply the framework in this playbook to real-world AI security incidents. Case 1 and Case 2 draw on publicly reported incidents analysed by the author; Case 3 is a composite drawn from common deployment patterns. In each case, the focus is on what the response should have looked like, not just what went wrong.

Case Study 01

The $25M Deepfake Fraud (2024)

Incident type: V5 — Systemic Trust Event · AI-enabled financial fraud · Government-sector organisation · 2024

What happened

A threat actor used AI-generated synthetic media, a real-time deepfake video call, to impersonate senior officials in a live video conference. Finance staff, believing they were receiving authorised instructions from legitimate leadership, transferred $25 million to fraudulent accounts. The deception was not discovered until after the transfers had been executed.

Root cause analysis

The proximate cause was the deepfake technology. The systemic root cause was a governance gap: there was no out-of-band verification protocol for large financial transfers. The AI-mediated communication channel (video conferencing) was being used as an authentication mechanism, which it was never designed to be. No secondary confirmation was required; no call-back protocol existed; no threshold-based approval workflow applied.

What an effective response looks like

Hour 0–4: Discovery
Containment and evidence preservation

Upon discovery that the transfer was fraudulent: immediate escalation to Incident Lead. Financial systems flagged to prevent further transactions pending investigation. Evidence preserved: all call recordings, transfer authorisation records, communications. Law enforcement contacted immediately, the window for financial recovery is narrow.

Hour 4–24: Investigation
Root cause and blast radius

Internal investigation: was this a targeted attack, or part of a wider campaign? Are any other transfers at risk? Which authentication controls failed? What was the chain of authorisation? Legal Counsel engaged: does this trigger regulatory notification?

Day 2–3: Communication
Regulated disclosure and internal briefing

Notify relevant financial regulators. Brief the board with a concise incident report. Prepare a communication plan for any third parties whose data or interests are affected.

Post-incident
Governance overhaul

Implement out-of-band verification for all financial transfers above a defined threshold. Deploy deepfake detection tooling for high-stakes video communications. Conduct tabletop exercises simulating synthetic media attacks across all departments with financial authority.

The Governance Lesson
Deepfake attacks exploit the assumption that video is a reliable authentication mechanism. Any financial or operational process that relies on video communication as its primary verification method is exposed. This is a governance gap, not a technology limitation, the technology to detect deepfakes exists. The policy requiring its use did not.
Case Study 02

McKinsey Lilli AI Tool Incident (March 2026)

Incident type: V4–V5 — Critical / Systemic Trust Event · Enterprise AI reliability failure · Professional services · 2026

What happened

McKinsey's internal AI assistant "Lilli" produced inaccurate client-facing outputs, raising questions about governance oversight and validation processes. The incident illustrates how a high-profile enterprise AI tool — built by one of the world's leading advisory firms — can produce governance failures that generate significant reputational exposure. The V-AIM response framework applied to this case demonstrates how AI reliability failures require the same command structure as security incidents.

Root cause analysis

This is a Reliability failure under the R3AI Standard: the system did not consistently do what it was designed to do when deployed in high-stakes client-facing contexts. The underlying governance gap was insufficient validation of AI outputs before client use, combined with a deployment environment that gave users insufficient signal about when outputs required independent verification. The reputational exposure was amplified by the prominence of the organisation and the trust assumptions clients place on advisory outputs.

What an effective V-AIM response looks like

Hour 0–4: Detection & Classification
Incident Lead activated; V4 classification

Upon identifying that AI outputs have reached clients and may be inaccurate: Incident Lead activated. Classification at V4 (Critical) with potential V5 escalation if client harm is confirmed at scale. Technical Containment Lead begins immediate audit of affected outputs.

Hour 4–24: Containment
Output suspension; client impact mapping

Suspension of the specific AI workflow pending investigation. Legal & Compliance Lead assesses professional liability exposure. Business Owner maps all client engagements where affected outputs may have been used. Communications Lead prepares holding statement for internal use.

Day 2–5: Governance Review
AI TRACE review; client notification

AI TRACE review initiated. Clients notified directly before any public disclosure. Accountability mapped: who approved the output for client use, and what validation steps were in place? Executive Sponsor signs off on client communication approach.

Post-incident: Evolution
Governance improvement and monitoring

Output validation protocols strengthened. Human-in-the-loop review requirements clarified for client-facing AI use. Monitoring enhanced. The incident is incorporated into the firm's AI governance framework as a standing reference case.

The Governance Lesson
Even the most sophisticated organisations can produce AI governance failures. The question is not whether your AI tool is built by a credible vendor. It is whether your deployment environment has the validation controls, output review processes, and accountability structures to catch failures before they reach clients. Lilli was a Reliability failure with Responsibility consequences — the R3AI lens reveals why both dimensions need to be governed.
Case Study 03

Model Bias Discovered in Production, Lending Decision AI

Incident type: V3 — Significant · AI bias in regulated financial service · Financial services · Composite case

What happened

A retail financial institution deploys an AI-based credit approval system. Six months post-deployment, an internal analyst notices that approval rates differ significantly across demographic groups, in a way that cannot be explained by creditworthiness factors alone. The finding is escalated internally. The legal and regulatory implications are immediately significant: this may constitute unlawful discrimination under consumer credit regulation.

What made this particularly difficult

The system was technically functioning correctly. It was producing outputs consistent with its training data. The training data reflected historical lending patterns, and historical lending patterns reflected decades of structural bias in credit markets. The AI had learned and replicated historical bias. This is a governance failure that cannot be resolved by a software fix: it requires a fundamental reconsideration of what the training data represents.

The response

Day 1, Classification
V3 declaration, no system shutdown initially

There is no active harm requiring immediate shutdown (approvals are not causing safety risk). But the situation is V3 — Significant: significant business and regulatory impact, with potential for legal liability and regulatory action. Legal & Compliance Lead is engaged immediately. Does this constitute unlawful discrimination? Does it require FCA notification?

Day 2–3, Technical review and escalation
Confirm findings; decision on suspension

Technical team confirms the statistical disparity. The decision is made to suspend the AI model and revert to manual review for new applications. This is the right call: the cost of continued discriminatory approvals exceeds the cost of manual review overhead.

Day 4–14, Audit and regulatory engagement
Retrospective audit; proactive regulatory disclosure

All approved and declined decisions from the AI system are reviewed for affected patterns. Proactive engagement with the FCA, not waiting to be discovered. Proactive disclosure is treated as a significant mitigating factor in regulatory proceedings.

Post-incident, Governance overhaul
Retraining, fairness constraints, ongoing monitoring

The model is retrained with fairness constraints. Bias metrics are added to the model monitoring dashboard alongside accuracy metrics. A quarterly fairness audit is established as a permanent governance control.

The Governance Lesson
AI system bias does not announce itself at deployment. It emerges gradually, often months later, often only through proactive analysis, not through operational failure. Post-deployment monitoring must include fairness metrics, not just accuracy metrics. This is a board-level governance decision, not a technical one.

5

Part 5: Leadership Metrics Guide

Governance maturity is not measured by the absence of incidents. It is measured by how well an organisation detects, commands, and learns from them. This section provides the leadership metrics that demonstrate AI incident readiness and response quality at executive and board level.

Key Performance Indicators for AI Incident Management

MetricWhat It MeasuresTargetFrequency
Mean Time to Detect (MTTD) Average time between incident start and organisational awareness <4 hours for V4/V5; <24 hours for V2/V3 Per incident; quarterly average
Mean Time to Contain (MTTC) Average time from detection to confirmed containment <2 hours for V5; <8 hours for V4 Per incident; quarterly average
Regulatory Notification Compliance % of notifiable incidents notified within the required window 100% Per incident
AI TRACE Completion Rate % of V2+ incidents receiving a completed AI TRACE review 100% for V3+; >80% for V2 Per incident; quarterly audit
12 Non-Negotiables Compliance % of 12 non-negotiable readiness prerequisites satisfied 100% Quarterly review
Incident Recurrence Rate % of incidents where the same root cause category appears more than once 0% for V4/V5 root causes Quarterly trend
Response Team Readiness % of V-AIM command roles with named primary and backup contacts in place 100% Quarterly check; after any team change
Simulation Frequency Number of tabletop incident simulations conducted per year Minimum 2 per year Annually reported

Board-Level Reporting Standards

Boards should receive a standardised AI incident summary at each board meeting, not just when a significant incident occurs. The absence of incidents is itself a governance signal that should be reported alongside incident data.

What the Board Should Always See

  • Incident count by V-SEV level in the reporting period
  • Any open V4 or V5 incidents, with current status and ETA to resolution
  • MTTD and MTTC trend vs. prior period
  • 12 Non-Negotiables compliance status
  • Regulatory notifications made, with outcomes
  • AI TRACE completion status for prior period incidents

What the Board Should Never Receive

  • Raw technical logs or model performance metrics without interpretation
  • Jargon that obscures accountability (e.g. "the model experienced an anomaly")
  • Incident reports that lead with technical cause rather than business and customer impact
  • Action plans without named owners and deadlines
→ Board Governance Principle
The board's role in AI incident oversight is not to understand the technical failure. It is to ensure that governance conditions were in place before the incident, that the response met the organisation's stated standards, and that learning has produced durable improvement — not just resolution.

A–F

Appendices: Editable Templates

The following six templates are designed to be adapted for your organisation. Adapt the language to your organisation's tone and governance structure. Pre-approve templates B, C, and D with Legal Counsel before any incident occurs.

Appendix A

Incident Declaration Template

AI INCIDENT DECLARATION, CONFIDENTIAL
Date & Time of Declaration
 
Declared By
 
Incident Commander Assigned
 
Severity Classification
SEV-1 CriticalSEV-2 HighSEV-3 MediumSEV-4 Low
AI System(s) Affected
 
 
Nature of the Incident (plain language description)
 
 
 
Discovery Method
Automated monitoring alertUser / employee reportCustomer complaintExternal reportMedia / publicRoutine review
Is Active Harm Currently Occurring?
Yes, immediate shutdown requiredNo, retrospectiveUnknown, shutdown pending assessment
Immediate Containment Action Taken
System shut downSystem suspended pending investigationSystem continues with enhanced monitoringOther (specify below)
 
Estimated Blast Radius (how many decisions / transactions / people may be affected?)
 
Regulatory Notification Assessment
EU AI Act notification may be required, deadline: ___________
GDPR / UK GDPR notification may be required, 72-hour clock started
Sector-specific notification required, regulator & deadline: ___________
No immediate notification obligation identified (document rationale)
Signed by Incident Commander
 
Appendix B

Internal Stakeholder Communication Template

INTERNAL, CONFIDENTIAL, [INCIDENT NAME / REFERENCE]
To
[Leadership team / Department heads / All staff, select appropriate audience]
From
[Incident Commander name and title]
Subject
AI System Incident, [System Name], [Date]

We are writing to inform you of an incident affecting [AI system name], which [brief, plain-language description of what the system does, one sentence].

What happened
[2–3 sentences describing what the AI system did or failed to do. Plain English, no technical jargon.]
 
Impact
[Who is affected and how, be specific. If customer impact: how many, in what way?]
 
What we have done
[Immediate containment actions taken. Be factual.]
 
What we are doing now
[Ongoing investigation and recovery actions. Who owns them.]
 
What you need to do
[Specific actions required from recipients, if any. If none: "No action is required from you at this time."]

Our next update to this group will be at [time/date]. Questions should be directed to [named contact] only, please do not discuss this incident externally or with customers until further notice.

[Name], Incident Commander

Appendix C

Customer / External Communication Template

EXTERNAL COMMUNICATION, PRE-APPROVAL REQUIRED FROM LEGAL COUNSEL
Subject line
An important update regarding [your service / product name]

Dear [Customer name / Valued customer],

We are writing to let you know about an issue that affected [plain-language description of the AI-powered service, avoid the word "AI" unless legally required or already public].

What happened
[On (date), we identified that (plain language, what the system did, what the effect on the customer was). Avoid technical language entirely.]
 
What we have done
[Actions taken to stop the incident and protect customers. Be specific, "we suspended the service" is better than "we took immediate action".]
 
What this means for you
[Specific, practical impact on this customer. If there is no impact: say so explicitly. If there is a remedy: describe it clearly.]
 
What we are doing to prevent this from happening again
[Brief, credible description, not a list of vague commitments. Specific improvements where possible.]

If you have any questions or concerns, please contact [dedicated contact / channel, not a generic support address for a serious incident].

We take [privacy / safety / accuracy] seriously and we are sorry for any distress or inconvenience this has caused.

[Name], [Title]
[Date]

Appendix D

Media Statement Template

MEDIA STATEMENT, FOR PRESS ENQUIRIES ONLY, PRE-APPROVAL REQUIRED

STATEMENT FROM [ORGANISATION NAME]

[Date]

[Organisation name] is aware of [brief, factual description of the incident, one sentence. Do not speculate. Do not use language that implies certainty about cause unless it has been confirmed].

[We have taken the following immediate action: (specific action taken, system suspension, investigation launched, regulators notified).]

[Affected parties, if applicable: "We have notified / are notifying affected customers directly."]

We are committed to [the relevant value, the safety of our customers / the integrity of our systems / transparent operation] and will provide further updates as our investigation develops.

Media enquiries
[Named press contact name, email, phone, do not use general inbox]

Note: Prepare this statement from the moment of SEV-1 or SEV-2 declaration. Do not wait for a journalist to call. The holding statement is used when a journalist contacts you before you have a full statement ready, "We are aware of the situation and are investigating. We will have a full statement within [timeframe]." Never say "no comment".

Appendix E

AI TRACE Post-Incident Review Template

AI TRACE POST-INCIDENT REVIEW, CONFIDENTIAL
AI System
 
Incident Date
 
Review Date
 
Facilitator
 
Attendees
 
 

Section 1: Incident Timeline

List all events chronologically from first detection to resolution, with timestamps.

 
 
 
 
 

Section 2: Root Cause Analysis

Primary root cause category
Model failureData qualityDeployment errorProcess failureAdversarial attackGovernance gap
Five Whys Analysis
Why 1 (proximate cause)
 
Why 2
 
Why 3
 
Why 4
 
Why 5 (systemic root cause)
 

Section 3: Response Assessment

What worked well in our response?
 
 
What did not work well?
 
 
What would we do differently?
 
 

Section 4: Detection Assessment

How was this incident discovered?
 
Could it have been detected earlier? If so, how?
 
 
What monitoring or alerting improvements are required?
 
 

Section 5: Governance Gaps Identified

Gap IdentifiedOwnerRemediation ActionTarget Date
    
    
    

Section 6: Action Plan

ActionOwnerPriorityTarget DateStatus
     
     
     
     
Approved by Incident Commander
 
Date
 
Appendix F

Board Incident Report Template

AI INCIDENT REPORT, BOARD SUMMARY, CONFIDENTIAL
Prepared By
 
Date
 
Incident Reference
 
Status
OngoingContained, investigation continuingResolved

1. Incident Summary

3–4 sentences. What happened, when, what the impact was. Plain English, no technical language.

 
 
 

2. Immediate Response

What was done in the first 24 hours. Focus on decisions, not technical detail.

 
 

3. Root Cause

One or two plain-English sentences. The board does not need the technical root cause, they need to understand whether this was a technology failure, a process failure, a governance gap, or an external attack.

 
 

4. Impact Summary

CategoryDetail
Customers / users affected 
Financial impact (direct) 
Financial impact (remediation estimate) 
Regulatory notifications filed 
Regulatory investigation status 
Reputational / media exposure 

5. Preventative Actions

What is being done to ensure this does not recur. Specific actions with named owners, not general commitments.

 
 
 

6. Open Items Requiring Board Attention

ItemOwnerDue DateBoard Action Required?
    
    

The board is asked to note this report and [any specific board action required, e.g., approve remediation budget / note regulatory status / confirm escalation threshold has been met].

Coming Soon · AI Incident Command Course

The playbook is step one.
Now learn to lead the response.

Most leaders read the plan. The best ones have rehearsed it. The AI Incident Command course takes you through live scenarios, command decisions, and board-level communication — so when the crisis lands, you're already ahead of it.

Right now, AI systems are making decisions no one is reviewing. Incidents are being mis-categorised. Regulators are asking questions no one is prepared for. The gap between knowing and doing is where organisations fail.

Join the Waitlist →

Free to join · No commitment · First access when doors open

Chat on WhatsApp