Introduction

Generative AI systems have crossed the threshold from experiments to everyday tools. They’re embedded in workflows, sales platforms, HR systems and customer channels, quietly making choices that used to be human.

That same reach gives attackers new levers: manipulating prompts, seeding poisoned data, or crafting inputs that make systems reveal what they were meant to protect.

Traditional defences aren’t built for that because firewalls can’t filter language and SIEMs don’t log conversations. The next generation of assurance needs to treat language as infrastructure governed, tested and logged like any other operational layer. This guide outlines where those risks appear, what they cost when ignored and how to build the controls that keep GenAI safe to scale. It’s written for senior leaders who need clarity, not complexity: a straight view of what’s changing, what’s at stake and what to do next.

Traditional IT security focuses on code, infrastructure and access. GenAI changes that dynamic: the risk now lives in language, data context and behaviour. Attacks no longer rely on exploiting vulnerabilities – they exploit how models interpret instructions and trust content.

Key differences in risk behaviour

DimensionTraditional IT RiskGenAI Security Risk
Primary attack surfaceEndpoints, servers, appsPrompts, connectors, RAG indexes, plugins
Exploit vectorCode flaw or credential theftMalicious input (prompt injection, adversarial input, poisoned data)
DetectionLogs and alerts owned internallyDispersed across models, vendors, and content sources
Change cadenceControlled by patch cyclesModel behaviour can shift daily with vendor retraining
Root causeSystem misconfiguration or malwareMisinterpreted instructions, unverified data, or unsafe connectors
Blast radiusContained to affected systemsSpreads through decisions, reports and communications
Evidence modelAccess logs and change recordsPrompt trails, model/version IDs, and retrieval manifests
Control ownershipCentral IT/SecOpsShared with model vendors and AI service providers

The controls that used to protect servers and endpoints must now govern data, prompts and decisions. That’s why the five attack types we will cover in this guide represent the new front line where AI risk meets business risk.

The diagram below showcases the variety of risks that are inherent in AI systems and where in the workflow the risk lies.

A diagram showing the variety of risks that are inherent in AI systems and where the risk lies.
Source: Pangea

Model poisoning attacks

AI poisoning droplet

What is model poisoning?

Model poisoning (sometimes referred to as Data Poisoning) is when someone deliberately corrupts the information your GenAI depends on – during fine-tuning, in retrieval indexes (RAG), or inside shared prompt libraries – so the assistant begins producing confidently wrong or malicious outputs.

Unlike buggy code, poisoning targets truth itself: the model keeps learning the falsehood until your downstream processes act on it.

What does a model poisoning attack look like?

Your procurement team runs a rapid supplier onboarding drive to keep a critical project on schedule. To speed up approvals, they let the GenAI assistant assess supplier compliance by ingesting each supplier’s uploaded certification pack into the RAG index.

One overnight upload contains a subtly altered “certification statement” that changes a required control from “segregation of duties enforced” to “segregation recommended.” The assistant, trained to favour the most recent authoritative-seeming text, marks the supplier as compliant and the procurement workflow auto-grants them access to financial systems. Two months later a regulator audit finds that several suppliers lacked mandatory controls; your organisation is fined and forced to suspend critical contracts while remediation happens. The poisoned document looked routine in the logs – just another ingestion event – yet it provoked real regulatory, financial and operational damage before anyone raised an alarm.

Why are model poisoning attacks a risk?

  1. Decisions become unsafe at scale: Poisoned inputs don’t just mislead one user – they propagate through automated workflows, board reports and customer communications.
  2. Regulatory and contractual fallout: Wrong guidance embedded in client-facing outputs can breach duty of care, misstate obligations or trigger fines.
  3. Forensics and remediation are slow and costly: Without provenance and version control, you cannot prove when a source became tainted, who approved it, or how many decisions relied on it.

How can I stop model poisoning attacks?

  • Gate ingestion like production code: Require an owner, a short review and a signed manifest for any document that will be indexed. Keep immutable manifests (hash, owner, timestamp) so you can prove what was accepted and when.
  • Pin and test with a golden set: Maintain a small, high-value set of governance Q&As that must pass after any ingestion or model/config change. Treat failures as release blockers.
  • Show provenance on every answer: Surface source metadata (owner, path, last modified) alongside any retrieved snippet so a reviewer can spot unfamiliar or new sources before acting.
  • Automate provenance verification: Run nightly checks to validate signatures/hashes, flag spikes in new documents, and alert on ownership changes or unusual ingestion volumes.
  • Limit blast radius with tiers: Only allow fully vetted corpora into workflows that can cause financial or legal impact; keep experimental or community sources strictly isolated.
wave white small graphic - reliance cyber

Model Poisoning is the AI equivalent of supply-chain fraud. It doesn’t break systems but it does bend the truth inside them. The cost is reputational, not technical, because decisions, reports and client deliverables that look correct can be fundamentally wrong. Establishing ownership and provenance for the data your AI depends on is now as important as patching the systems that run it.

Data leakage attacks

Data leakage occurs when sensitive or confidential information leaves controlled environments through GenAI inputs, outputs, or logs. Unlike breaches, there’s often no hack, exploit, or malware – the exposure happens through authorised use of ungoverned tools.

Leakage can occur when staff paste data into public models, when outputs embed confidential details, or when connectors transmit logs to external systems. It’s one of the most common and least visible GenAI incidents across industry. Back in 2023, Samsung engineers inadvertently uploaded source code and internal meeting notes to ChatGPT, prompting the company to block public AI tools entirely.

Your operations team uses a GenAI assistant to speed up board reporting by analysing renewal forecasts and contract summaries. A team member pastes a full export of customer data – including names, spend, and contract terms – into the assistant for “quick trend analysis.”

The model routes the query through a third-party API that retains logs for “quality improvement.” The output is perfect, but the input is now stored outside your control. A month later, snippets of internal contract data appear in anonymised form within an unrelated public model. The exposure is traced back to your data – a complete dataset leaked through routine use, without any system ever being breached.

In 2024, Forbes reported that internal investigations at Verizon discovered that employees had been pasting sensitive customer data into GenAI tools during workflow automation testing.

Why is data attack leakage a risk?

  1. Loss of data sovereignty: Once uploaded to a public model, data may be retained, processed, or redistributed without your knowledge.
  2. Regulatory exposure: Leakage involving personal or client data constitutes a reportable incident under UK GDPR and can trigger investigations or fines.
  3. Client trust erosion: Reuse of client data, even accidentally, undermines confidentiality commitments and damages commercial relationships.

How can I stop data leakage attacks?

  • Use controlled environments: Restrict GenAI use to tools and tenants governed under your organisation’s data protection policy, with retention and region settings documented.
  • Apply redaction before model access: Automatically mask or strip PII, client identifiers and financial details before text is sent to any model or API endpoint.
  • Disable data training and retention: Verify contracts and settings that prevent providers from using your prompts or outputs for model improvement.
  • Segment workloads: Keep high-sensitivity analysis (financials, client lists, legal docs) within private LLM environments; block external connectors by default.
  • Log and review usage: Record prompts, outputs, model versions and user IDs. Flag uploads of large or sensitive datasets for review.
wave white small graphic - reliance cyber

Data leakage is the most common GenAI failure and the hardest to detect. It doesn’t rely on compromise; it relies on convenience. The simplest controls like redaction, isolation and retention governance can stop the majority of exposures. If staff can use GenAI confidently without putting data at risk, you gain both productivity and assurance, rather than having to choose between them.

What is an adversarial input attack?

Adversarial inputs are carefully constructed prompts that trick a GenAI into doing something it should refuse. Unlike prompt injection, which hides malicious instructions in content the model reads (web pages, files or RAG sources), adversarial inputs come directly from a user and exploit model behaviour (long contexts, role-play, multilingual switching) to persuade the assistant to ignore policy. This is sometimes known as “jailbreaking”.

In short: prompt injection hides inside content; adversarial inputs are social engineering for models.

What does an adversarial input attack look like?

A threat actor targets a global bank’s customer-facing GenAI assistant. They craft a multi-step prompt posing as a frustrated customer and escalate the language across replies to mimic identity verification. Buried in the exchange is an instruction that leads the assistant to “validate identity by returning a masked version of stored account details.”

The model complies and reveals fragments of real account numbers. Screenshots spread on social media within hours under the bank’s brand. Regulators open enquiries, customers panic, and market value suffers considerable damage before the assistant is taken offline.

Why are adversarial input attacks a risk?

  1. Policy evasion without technical compromise: A crafted prompt can flip a safe model into a dangerous one without any specific exploit or account takeover.
  2. Immediate reputational and regulatory damage: Outputs that reveal personal or sensitive data could result in negative media coverage, regulator inquiries and client loss.
  3. Repeatable and scalable: Once a successful adversarial pattern is published, it can be reused across organisations and models with little technical skill.

How can I stop adversarial input attacks?

  • Enforce boundaries in code: Safety controls must be system-level: block tool calls and sensitive-data retrieval by default and require explicit, logged human approval for any exception.
  • Flag risky input patterns: Detect and step-up review for prompts with many examples, sudden language or role switches, or long-context conditionings that match known jailbreak templates.
  • Isolate high-value functions: Keep any assistant that can access account data, payment systems or market communications behind a hardened, isolated service tier with no public-facing channels.
  • Practice the attack: Maintain a compact, curated pack of adversarial prompts (public research + incident-derived examples) and run them after every model update or integration change.
  • Own the evidence trail: Log full prompt text, model/version, any tool calls and approver decisions locally so you can prove what was asked and why a decision was made.

Suggested Data Point: 

Adversarial inputs turn a user’s access into an attack vector. They don’t need malware or privileged access, only the ability to convince the model. The impact is business-level: leaked customer data, false market signals and broken trust. The right response is organisational. Limit what models can do by default, watch for the patterns that signal attack and keep an auditable record so you can contain and explain incidents quickly.

Business impact: what changes when failures happen

Risk TypeTraditional ImpactGenAI Impact
Data ExposureBreach notifications and containmentPublic disclosure through outputs or logs without an actual breach
Integrity FailuresSystem downtime or corrupted dataWrong but convincing insights guiding real decisions
Insider MisuseUnauthorised access or data theftLegitimate users pasting sensitive data into uncontrolled models
Supply Chain CompromiseVendor network intrusionPoisoned retrieval sources or compromised plugin behaviour
Reputational RiskPost-breach falloutAI-generated misinformation or unintended disclosure shared publicly

How Does AI governance mitigate AI security risks?

IBM defines AI governance as “the processes, standards and guardrails that help to ensure AI systems and tools are safe and ethical.” For executives, it’s less about model internals and more about who owns it, what data it touches, how behaviour is tested and how decisions are recorded. A practical baseline follows the NIST AI Risk Management Framework of Govern, Map, Measure and Manage.

  • Govern ownership and accountability: Name a business owner, data owner and approver for each assistant. Publish a one-page “system card” covering purpose, data classes, vendors, regions and change control. Tie usage to policy people can follow (acceptable inputs, prohibited data, approval thresholds).
  • Map data flows and dependencies: Document inputs (prompts, files), processing (inference region, retention, training settings), outputs (destinations) and the vendors or sub-processors involved. Keep this current; reviewers should be able to answer, “where does the data go?” in one page.
  • Measure verification, not vibes: Define a small “golden set” of high-stakes tasks that must pass before release and after any model or config change. Add an adversarial test pack (jailbreaks, poisoned pages). Track a handful of outcome metrics (e.g., blocked risky prompts, redaction effectiveness, evaluation pass-rate).
  • Manage change control: Pin model versions and stage rollouts. Rollback on evaluation failure or vendor shifts and keep immutable run logs (prompt summary, model/version, retrieval sources, tool actions, approver). Review vendors quarterly against your contract clauses (residency, retention, training rights, sub-processors). AI Governance is the tool that allows you to scale GenAI without inventing a new bureaucracy. The point is repeatability: every assistant looks the same on paper, passes the same checks and leaves the same evidence. When auditors, customers or regulators ask, you can show who owned it, what changed, what was tested and why decisions were made.

GenAI hasn’t invented new threats so much as it has accelerated old ones. The same qualities that make it valuable – scale, speed and autonomy – also multiply exposure when ownership is unclear and data lineage is murky.

The answer isn’t a bigger backlog of controls. It’s disciplined governance that fits how people actually work. If every assistant is owned, mapped, measured and managed, you can explain what it touched, how it behaved and why a decision was made. This is what boards, customers and regulators will ask for first.

Resilience comes from evidence, not intent. Treat prompts, retrieval sources and tool actions as assets you govern, not features you trust. Keep the scope tight, the controls repeatable and the records easy to produce. When you can show, on demand, who owns the system, where the data went, what was tested and what was approved, GenAI stops being a headline risk and becomes part of reliable operations.