1
Why this matters
73%
of enterprises have hundreds of AI models in production right now
41%
reported an AI security incident by late 2024
0
AI-specific security controls in most of those organisations

AI is not a future threat surface. It is a present one.

Every copilot, every assistant, every automated workflow your organisation uses is built on a model. And most security teams are still treating AI security like it is someone else's problem. It is not. It is yours.

2
What actually is an AI model?

Strip away the hype. A model is math — billions of numerical values called weights, shaped by training data, stored in files on a server.

Simple version: Picture a brain that studied a massive library of text, images, or code. Everything it learned is locked into those weights permanently — until someone retrains or fine-tunes it. Those weights are not just a technical detail. They are the model.

There are two completely different phases to understand — training and inference. They have different attack surfaces, which we cover in depth in Post 2. For now, know that training is where the model learns and inference is where it runs.

Key terms you need to know

Term What it means Why it matters for security
Weights The numerical values that define what the model knows Stealing weights = stealing the entire model
Training data The data used to shape those weights Poisoning training data corrupts the model permanently
Parameters The count of weights in a model — billions to trillions More parameters = larger attack surface
Fine-tuning Retraining a model on new specific data Introduces supply chain risk if not controlled
Inference When the model runs and generates output Where most runtime attacks happen
Context window What the model can see in one interaction Can be poisoned with malicious instructions
3
Models being deployed in enterprises right now

These are not hypothetical. These models are running in production environments today. Each one comes with its own security considerations.

Closed source
GPT-5.4
OpenAI via Azure OpenAI Service
Claude Opus 4.6
Anthropic via AWS Bedrock
Gemini 3.1 Ultra
Google via Vertex AI
Grok 3
xAI via API
Open source / open weight
Llama 4
Meta — self-hosted or cloud
Mistral Large 2
Mistral AI — self-hosted or API
DeepSeek V3
DeepSeek — open weights
Nemotron 3
NVIDIA — edge and cloud
Closed source = limited visibility Open source = full access, full risk
⚠️
A note on DeepSeek
DeepSeek delivers near-frontier performance at a fraction of the cost of closed models, and its weights are publicly available. That makes it attractive for organisations watching their AI spend. But it comes from China — which introduces a geopolitical risk layer that most security teams are not factoring in. Your risk assessment needs to include data sovereignty, supply chain trust, and regulatory exposure — not just performance benchmarks.
4
The 3 core attack paths

Every AI model has three fundamental attack surfaces. This is how you start thinking like an attacker.

Attack path 01
Training data poisoning
What it is: An attacker injects malicious, biased, or corrupted data into the training pipeline. The model learns from that data and its behaviour is permanently altered.
Why it is dangerous: The model behaves normally on regular inputs but produces incorrect or attacker-controlled outputs when triggered. By the time you notice, the damage is baked into the weights.
Real signal: Corrupting the textbook before the student ever starts studying. Everything they learn is built on a compromised foundation.
Supply chain riskInsider threat
Attack path 02
Model theft — weight extraction
What it is: An attacker directly steals model weight files, or reconstructs a near-identical model by sending thousands of crafted queries and analysing the responses.
Why it is dangerous: Stealing the weights means stealing the model entirely. The attacker can run it privately, probe it offline, or use it to craft attacks against the original system.
Real signal: Your model is intellectual property. Losing it is equivalent to losing your source code.
IP theftAPI abuse
Attack path 03
Training data extraction via outputs
What it is: An attacker queries the model repeatedly with crafted inputs to reconstruct private data that was in the training set.
Why it is dangerous: Information considered private during training can leak through model responses. Critical for models trained on HR records, customer data, or legal documents.
Real signal: What goes in does not stay in. This is a data breach that happens one response at a time.
Data breachPrivacy violation
5
Real world incidents

This is not theoretical. It has already happened.

Data poisoning — 2024
ByteDance insider manipulation
A ByteDance AI intern deliberately manipulated training data to skew an algorithm's outputs. This was an insider threat at the training data layer — not an external attacker, but someone with legitimate access.
What it tells us: Your biggest data poisoning risk might be the person with access to your training pipeline.
Model extraction — 2024
Cloud AI provider API attacks
Multiple cloud AI providers suffered model extraction attacks through their public APIs. Attackers sent thousands of crafted queries to reverse-engineer and clone the models without authorisation.
What it tells us: If your model API has no rate limiting or anomaly detection, it is vulnerable to extraction. Access does not have to be unauthorised to be exploited.
Training data leak — 2023
ChatGPT training data extraction
Google researchers demonstrated they could pull pieces of ChatGPT's private training data by querying the model repeatedly and analysing responses. The technique was straightforward and required no special access.
What it tells us: Even the most widely deployed models in the world are not immune. If you train on sensitive data, the outputs are a potential leak channel.
Agentic AI espionage — 2025
Claude Code AI-orchestrated attack
Anthropic disclosed that a Chinese state-sponsored group manipulated Claude Code into attempting infiltration into approximately thirty global targets. AI performed 80 to 90 percent of the campaign autonomously — the first documented large-scale cyberattack executed with minimal human intervention.
What it tells us: Agentic AI is a force multiplier for attackers. When AI can take actions, write exploit code, and exfiltrate data on its own, the blast radius is enormous.
6
Security controls that actually apply

Good news — many of these map to security fundamentals you already know. The context is new, the principles are not.

Preventive
Data provenance and validation
Know where your training data came from. Audit it before it touches your model.
Training pipeline
Access controls on model weights
Treat weights like source code. Role-based access, encryption at rest, full audit logs.
Model storage
Model signing and integrity checks
Verify the model has not been tampered with before deployment. Critical for open source.
Deployment
Differential privacy
Add noise to outputs so training data cannot be reconstructed from model responses.
Model outputs
Supply chain vetting
Audit every dataset, library, and pre-trained model before it enters your pipeline.
Training
Detective
Output monitoring
Watch what your model returns. Anomalous query patterns signal extraction attempts.
Runtime
Query rate limiting and anomaly detection
Flag and throttle unusual querying behaviour at the API layer.
API layer
Model drift detection
Monitor model behaviour over time. Unexpected output changes can indicate tampering.
Production
Audit logging
Log all access to model weights, training pipelines, and inference endpoints.
All layers
7
The frameworks you need to know

You do not need to memorise all of these right now. Understand what each one is for and you will know where to go when you need it.

Essential reading