AI systems occupy a different kind of risk position in your organisation. They don't just fail to work, they work incorrectly, often at scale, often undetected, and often in ways that can't simply be rolled back. Three characteristics make AI incidents qualitatively different from all other technology failures:
| Characteristic | Traditional IT Failure | AI System Failure |
|---|---|---|
| Detection | Usually immediate, system goes down, alert fires | Often delayed, system keeps running, producing wrong outputs nobody notices |
| Blast radius | Bounded, the system stops, impact halts | Accumulating, an AI making 10,000 decisions/day creates 10,000 harms before detection |
| Accountability | Clear, the system failed | Ambiguous, the model, the data, the deployment, the humans, the vendor? |
| Reversibility | High, restore from backup, restart the service | Low, retrospective decisions can't be un-made; affected parties can't be un-harmed |
| Regulation | General IT standards and GDPR | EU AI Act Article 73, sector-specific AI regulations, GDPR, FCA, ICO |
AI incidents fall into three categories under the R3AI Standard: Reliability failures (the system doesn't do what it was designed to do), Resilience failures (the system can't maintain performance under real-world conditions), and Responsibility failures (the system causes harm, bias, or ethical damage). Understanding which category an incident falls into shapes how you contain, govern, and learn from it.
This playbook gives leadership teams the V-AIM command framework to lead their response. It covers the six stages of V-AIM — Prepare, Detect, Contain, Govern, Recover, Learn — and the First 24 Hours timeline, regulatory obligations, communication sequencing, the AI TRACE post-incident review, and the Leadership Metrics Guide. Six editable templates in the appendices can be adapted for your organisation immediately.
A note on use: this playbook is written for executive leadership teams, the people who need to govern an AI incident, not just technically resolve it. It is deliberately jargon-light and action-oriented. Technical teams will need their own supplementary runbooks; this document governs the leadership layer that sits above them.
The single most important thing you can do for AI incident response is prepare before anything goes wrong. Organisations that define their severity classifications, response team roles, and communication templates in advance will respond in hours. Those that start from scratch during an incident will respond in days, by which time the damage has compounded.
Not every AI problem is a crisis. The V-SEV scale provides a consistent severity classification language across technical and non-technical teams. Use this matrix to classify incidents consistently so everyone on the command team is working from the same understanding of urgency and expected response times.
| V-SEV Level | Definition | Examples | Response SLA | Escalation |
|---|---|---|---|---|
| V5 — SYSTEMIC TRUST EVENT | Active harm occurring, regulatory breach, or material public exposure. Widespread impact on trust, operations, or reputation. | Deepfake fraud in progress; AI producing dangerous medical or financial advice; PII breach via AI system; AI weaponised against customers; systemic bias at scale | Immediate response. Incident Lead on-call within 1 hour. Board notification within 4 hours. | Incident Lead → Executive Sponsor → CEO → Legal → Board |
| V4 — CRITICAL | Significant business or reputational impact; regulatory exposure likely; material customer harm. | Biased model discovered affecting hiring or lending decisions; financial miscalculations in AI-assisted reporting; identity verification bypass at scale; governance failure with external consequences | Response team assembled within 4 hours | Incident Lead → Executive Sponsor → Legal |
| V3 — SIGNIFICANT | Operational disruption with limited external impact; quality degradation affecting business processes; regulatory interest possible. | AI system goes offline; automation pipeline failures causing delays; model drift reducing output quality below threshold; localised bias detected | Response within 24 hours | Technical Containment Lead → Incident Lead |
| V2 — MODERATE | Repeated errors or anomalies with limited impact; internal review required; no external regulatory trigger. | Gradual accuracy decline affecting outputs; unusual patterns flagged by monitoring; minor data quality issues in production; edge-case model failures | Response within 48 hours | Technical Containment Lead |
| V1 — IRREGULARITY | Minor deviation; no external or customer impact; detected through routine monitoring. Informational only. | Single anomalous output; testing environment issues; minor configuration drift; low-confidence flag from monitoring system | Review within 72 hours. Log and monitor. | Technical Containment Lead (log only) |
Define and assign these six V-AIM command roles before any incident occurs. Each role must have a named primary and a named backup. These are decision-making roles with clear authority — not just communication functions.
| V-AIM Role | Responsibilities | Typically Held By |
|---|---|---|
| Executive Sponsor | Ultimate accountable executive. Approves significant decisions on containment, disclosure, and recovery. Single board-to-response-team interface. Signs off regulatory notifications. | CEO, COO, or Board-designated executive |
| Incident Lead | Operational command of the incident. Coordinates all six V-AIM command roles. Makes the call on containment actions. Single escalation point for the response team. Produces status updates for the Executive Sponsor. | CAIO, CTO, or senior executive designated in advance |
| Technical Containment Lead | Owns the technical investigation. Responsible for root cause analysis, containment actions, model rollback, evidence preservation, and technical recovery validation. | Head of AI/ML, Principal Engineer, or Lead Data Scientist |
| Legal & Compliance Lead | Assesses regulatory notification obligations (EU AI Act Article 73, GDPR, FCA, sector-specific). Reviews all external communications before release. Manages evidence chain and liability considerations. | General Counsel, Chief Compliance Officer, or external AI law firm on retainer |
| Communications Lead | Owns all internal and external messaging. Controls communication timing and sequencing. Prepares and adapts templates. Approves all final language before distribution. Manages media if required. | Head of Communications, CMO, or external PR lead |
| Business Owner | Represents the business function(s) affected. Quantifies customer and operational impact. Manages customer service escalations. Tracks affected accounts, decisions, or transactions. Confirms when operational recovery is complete. | COO, Customer Success Director, or Operations Head |
Before your organisation is ever in an AI incident, the following 12 conditions must already be satisfied. These are the Non-Negotiables — the minimum viable governance foundation for effective V-AIM command. If any of these are missing, your incident response capability is incomplete, regardless of the quality of your technical infrastructure.
The first 24 hours of an AI incident are the most consequential. Decisions made in this window determine whether the situation is contained or whether it compounds. Speed matters, but not more than the quality of decisions. Acting too fast without understanding the blast radius can make things worse.
Not every anomaly is an incident. Before triggering the full response, confirm:
Use the Incident Declaration Template (Appendix A). The act of formal declaration:
The first and most important question: Should we switch this AI system off?
If active harm is currently occurring → Shut down the AI system immediately. The cost of a false positive (unnecessary downtime) is always less than the cost of continued harm.
If harm is retrospective (already happened) → Preserve all evidence before any remediation. Do not restart the system until root cause is understood.
If the situation is ambiguous → Shut down and investigate. Resuming a broken system creates liability; prudent precaution does not.
If shutting down would cause greater harm → For example, a medical or safety-critical AI where shutdown creates patient risk. In this case, escalate immediately and engage Legal Counsel before any action.
This step is non-negotiable and must precede any technical fix. Legal and regulatory proceedings will depend on evidence integrity.
Work with Legal Counsel to determine notification obligations. Do not wait for this assessment to conclude before taking containment action, but begin it in parallel.
| Regulation | Trigger | Deadline | Action Required |
|---|---|---|---|
| EU AI Act (Art. 73) | Serious incident involving a high-risk AI system, including near-misses | Immediately upon becoming aware; follow-up report within timeline set by authority | Notify relevant market surveillance authority; preserve incident records |
| GDPR / UK GDPR | Personal data involved in breach caused or exacerbated by AI system | 72 hours from awareness to regulator; without undue delay to data subjects if high risk | Notify ICO (UK) / supervisory authority (EU); notify affected individuals if required |
| FCA (Financial Services) | Material operational incident affecting regulated services; AI-related consumer harm | As soon as practicable | Notify FCA; document incident and response; consider consumer redress obligations |
| Sector-specific | Healthcare, critical infrastructure, defence, etc. | Varies by sector and jurisdiction | Review sector-specific AI guidance; engage sector regulator |
By this point, the immediate crisis is stabilised. The focus shifts from containment to understanding. Root cause investigation must be structured, not anecdotal.
Every AI incident investigation must answer these four questions before remediation can be authorised:
| # | Question | Why it matters |
|---|---|---|
| 1 | What failed? (The model, the data, the deployment, the process, or the humans?) | Determines the remediation approach, a model fix is different from a process fix |
| 2 | When did it start? (Point of failure vs. point of discovery) | Defines the blast radius, how many decisions were affected before detection |
| 3 | What was the blast radius? (How many decisions, transactions, or people were affected?) | Drives notification obligations, customer redress, and reputational risk assessment |
| 4 | Why did our controls not catch this? (What monitoring, testing, or governance mechanism failed?) | The answer drives the post-incident governance improvement, without this, you will face the same incident again |
Create and maintain an incident timeline throughout, every action, every decision, every communication, timestamped. This serves three purposes: legal defensibility, post-incident learning, and board/regulator reporting. Assign a named scribe to maintain the timeline in real time.
How an organisation communicates during an AI incident often matters as much as what it actually does technically. Three rules govern effective crisis communication:
CEO, COO, CFO, and relevant department heads. Focus on classification, what is known, immediate actions taken, and the response plan.
Brief those whose teams may be affected or who will be involved in response. Give clear instructions on what to do and what not to say.
Only if the incident becomes public or if staff will encounter customer questions. Must be coordinated with external communications to ensure consistent messaging.
No customer should learn about an incident affecting them from the media. Personalised communication (using Appendix C template) to affected parties takes priority over all other external communications.
File notifications per regulatory requirements (see Part 2). Early, incomplete notification is preferred over late, complete notification.
Do not proactively publish a public statement unless the incident is already public or likely to become so. Unnecessary disclosure creates additional reputational risk without corresponding benefit.
Prepare a media holding statement from the moment the incident is classified V5 or V4. Do not wait for a journalist to call. Use the template in Appendix D.
One of the most common errors in AI incident response is restoring the system before the failure is truly understood. The pressure to restore service is real, but the cost of a second failure, or of restoring a system that continues to cause harm, is far higher than continued downtime.
The post-incident phase is where most organisations fail. The immediate crisis passes, pressure eases, and the hard work of systemic improvement gets deprioritised. Organisations that treat post-incident reviews as genuine learning exercises, rather than box-ticking, are the ones that do not face the same incident twice.
Conduct within 5–10 business days of incident resolution, while details are fresh. The full review template is in Appendix E.
Use the Five Whys technique: for each proposed cause, ask "Why did this happen?" until you reach the systemic root, not just the proximate trigger.
| Root cause category | Diagnostic questions |
|---|---|
| Model failure | Did the model perform outside its design envelope? Was there evidence of drift? Was the model trained on representative data? |
| Data quality | Was the training data representative, current, and unbiased? Were there data pipeline failures? Was there a distribution shift between training and production data? |
| Deployment error | Was there a misconfiguration, version mismatch, or integration failure at deployment? Were deployment checks adequate? |
| Process failure | Did humans rely on AI outputs without appropriate verification? Were escalation procedures followed? Were governance controls bypassed? |
| Adversarial attack | Was the AI system deliberately manipulated? Are there indicators of prompt injection, data poisoning, or model inversion? |
| Governance gap | Did adequate pre-deployment testing, monitoring, or human oversight exist? Was accountability clearly assigned? |
The board needs to know about significant AI incidents, but they do not need to understand them technically. A board report on an AI incident should inform, not overwhelm.
| Criterion | Good | Common error |
|---|---|---|
| Length | 1–2 pages maximum | 10-page technical briefing with model performance charts |
| Language | Plain English; no AI jargon | Technical terminology that directors cannot assess |
| Focus | Impact, response, and prevention | Detailed technical root cause analysis |
| Accountability | Clear owner for each remediation action | Diffuse accountability, "the team is working on it" |
| Tone | Factual; appropriate seriousness | Defensive; minimising the incident or over-reassuring |
The full editable Board Incident Report template is in Appendix F.
The following three case studies apply the framework in this playbook to real-world AI security incidents. Case 1 and Case 2 draw on publicly reported incidents analysed by the author; Case 3 is a composite drawn from common deployment patterns. In each case, the focus is on what the response should have looked like, not just what went wrong.
A threat actor used AI-generated synthetic media, a real-time deepfake video call, to impersonate senior officials in a live video conference. Finance staff, believing they were receiving authorised instructions from legitimate leadership, transferred $25 million to fraudulent accounts. The deception was not discovered until after the transfers had been executed.
The proximate cause was the deepfake technology. The systemic root cause was a governance gap: there was no out-of-band verification protocol for large financial transfers. The AI-mediated communication channel (video conferencing) was being used as an authentication mechanism, which it was never designed to be. No secondary confirmation was required; no call-back protocol existed; no threshold-based approval workflow applied.
Upon discovery that the transfer was fraudulent: immediate escalation to Incident Lead. Financial systems flagged to prevent further transactions pending investigation. Evidence preserved: all call recordings, transfer authorisation records, communications. Law enforcement contacted immediately, the window for financial recovery is narrow.
Internal investigation: was this a targeted attack, or part of a wider campaign? Are any other transfers at risk? Which authentication controls failed? What was the chain of authorisation? Legal Counsel engaged: does this trigger regulatory notification?
Notify relevant financial regulators. Brief the board with a concise incident report. Prepare a communication plan for any third parties whose data or interests are affected.
Implement out-of-band verification for all financial transfers above a defined threshold. Deploy deepfake detection tooling for high-stakes video communications. Conduct tabletop exercises simulating synthetic media attacks across all departments with financial authority.
McKinsey's internal AI assistant "Lilli" produced inaccurate client-facing outputs, raising questions about governance oversight and validation processes. The incident illustrates how a high-profile enterprise AI tool — built by one of the world's leading advisory firms — can produce governance failures that generate significant reputational exposure. The V-AIM response framework applied to this case demonstrates how AI reliability failures require the same command structure as security incidents.
This is a Reliability failure under the R3AI Standard: the system did not consistently do what it was designed to do when deployed in high-stakes client-facing contexts. The underlying governance gap was insufficient validation of AI outputs before client use, combined with a deployment environment that gave users insufficient signal about when outputs required independent verification. The reputational exposure was amplified by the prominence of the organisation and the trust assumptions clients place on advisory outputs.
Upon identifying that AI outputs have reached clients and may be inaccurate: Incident Lead activated. Classification at V4 (Critical) with potential V5 escalation if client harm is confirmed at scale. Technical Containment Lead begins immediate audit of affected outputs.
Suspension of the specific AI workflow pending investigation. Legal & Compliance Lead assesses professional liability exposure. Business Owner maps all client engagements where affected outputs may have been used. Communications Lead prepares holding statement for internal use.
AI TRACE review initiated. Clients notified directly before any public disclosure. Accountability mapped: who approved the output for client use, and what validation steps were in place? Executive Sponsor signs off on client communication approach.
Output validation protocols strengthened. Human-in-the-loop review requirements clarified for client-facing AI use. Monitoring enhanced. The incident is incorporated into the firm's AI governance framework as a standing reference case.
A retail financial institution deploys an AI-based credit approval system. Six months post-deployment, an internal analyst notices that approval rates differ significantly across demographic groups, in a way that cannot be explained by creditworthiness factors alone. The finding is escalated internally. The legal and regulatory implications are immediately significant: this may constitute unlawful discrimination under consumer credit regulation.
The system was technically functioning correctly. It was producing outputs consistent with its training data. The training data reflected historical lending patterns, and historical lending patterns reflected decades of structural bias in credit markets. The AI had learned and replicated historical bias. This is a governance failure that cannot be resolved by a software fix: it requires a fundamental reconsideration of what the training data represents.
There is no active harm requiring immediate shutdown (approvals are not causing safety risk). But the situation is V3 — Significant: significant business and regulatory impact, with potential for legal liability and regulatory action. Legal & Compliance Lead is engaged immediately. Does this constitute unlawful discrimination? Does it require FCA notification?
Technical team confirms the statistical disparity. The decision is made to suspend the AI model and revert to manual review for new applications. This is the right call: the cost of continued discriminatory approvals exceeds the cost of manual review overhead.
All approved and declined decisions from the AI system are reviewed for affected patterns. Proactive engagement with the FCA, not waiting to be discovered. Proactive disclosure is treated as a significant mitigating factor in regulatory proceedings.
The model is retrained with fairness constraints. Bias metrics are added to the model monitoring dashboard alongside accuracy metrics. A quarterly fairness audit is established as a permanent governance control.
| Metric | What It Measures | Target | Frequency |
|---|---|---|---|
| Mean Time to Detect (MTTD) | Average time between incident start and organisational awareness | <4 hours for V4/V5; <24 hours for V2/V3 | Per incident; quarterly average |
| Mean Time to Contain (MTTC) | Average time from detection to confirmed containment | <2 hours for V5; <8 hours for V4 | Per incident; quarterly average |
| Regulatory Notification Compliance | % of notifiable incidents notified within the required window | 100% | Per incident |
| AI TRACE Completion Rate | % of V2+ incidents receiving a completed AI TRACE review | 100% for V3+; >80% for V2 | Per incident; quarterly audit |
| 12 Non-Negotiables Compliance | % of 12 non-negotiable readiness prerequisites satisfied | 100% | Quarterly review |
| Incident Recurrence Rate | % of incidents where the same root cause category appears more than once | 0% for V4/V5 root causes | Quarterly trend |
| Response Team Readiness | % of V-AIM command roles with named primary and backup contacts in place | 100% | Quarterly check; after any team change |
| Simulation Frequency | Number of tabletop incident simulations conducted per year | Minimum 2 per year | Annually reported |
Boards should receive a standardised AI incident summary at each board meeting, not just when a significant incident occurs. The absence of incidents is itself a governance signal that should be reported alongside incident data.
The following six templates are designed to be adapted for your organisation. Adapt the language to your organisation's tone and governance structure. Pre-approve templates B, C, and D with Legal Counsel before any incident occurs.
We are writing to inform you of an incident affecting [AI system name], which [brief, plain-language description of what the system does, one sentence].
Our next update to this group will be at [time/date]. Questions should be directed to [named contact] only, please do not discuss this incident externally or with customers until further notice.
[Name], Incident Commander
Dear [Customer name / Valued customer],
We are writing to let you know about an issue that affected [plain-language description of the AI-powered service, avoid the word "AI" unless legally required or already public].
If you have any questions or concerns, please contact [dedicated contact / channel, not a generic support address for a serious incident].
We take [privacy / safety / accuracy] seriously and we are sorry for any distress or inconvenience this has caused.
[Name], [Title]
[Date]
STATEMENT FROM [ORGANISATION NAME]
[Date]
[Organisation name] is aware of [brief, factual description of the incident, one sentence. Do not speculate. Do not use language that implies certainty about cause unless it has been confirmed].
[We have taken the following immediate action: (specific action taken, system suspension, investigation launched, regulators notified).]
[Affected parties, if applicable: "We have notified / are notifying affected customers directly."]
We are committed to [the relevant value, the safety of our customers / the integrity of our systems / transparent operation] and will provide further updates as our investigation develops.
Note: Prepare this statement from the moment of SEV-1 or SEV-2 declaration. Do not wait for a journalist to call. The holding statement is used when a journalist contacts you before you have a full statement ready, "We are aware of the situation and are investigating. We will have a full statement within [timeframe]." Never say "no comment".
List all events chronologically from first detection to resolution, with timestamps.
| Gap Identified | Owner | Remediation Action | Target Date |
|---|---|---|---|
| Action | Owner | Priority | Target Date | Status |
|---|---|---|---|---|
3–4 sentences. What happened, when, what the impact was. Plain English, no technical language.
What was done in the first 24 hours. Focus on decisions, not technical detail.
One or two plain-English sentences. The board does not need the technical root cause, they need to understand whether this was a technology failure, a process failure, a governance gap, or an external attack.
| Category | Detail |
|---|---|
| Customers / users affected | |
| Financial impact (direct) | |
| Financial impact (remediation estimate) | |
| Regulatory notifications filed | |
| Regulatory investigation status | |
| Reputational / media exposure |
What is being done to ensure this does not recur. Specific actions with named owners, not general commitments.
| Item | Owner | Due Date | Board Action Required? |
|---|---|---|---|
The board is asked to note this report and [any specific board action required, e.g., approve remediation budget / note regulatory status / confirm escalation threshold has been met].
Coming Soon · AI Incident Command Course
Most leaders read the plan. The best ones have rehearsed it. The AI Incident Command course takes you through live scenarios, command decisions, and board-level communication — so when the crisis lands, you're already ahead of it.
Right now, AI systems are making decisions no one is reviewing. Incidents are being mis-categorised. Regulators are asking questions no one is prepared for. The gap between knowing and doing is where organisations fail.
Join the Waitlist →Free to join · No commitment · First access when doors open