AI Model Security Fundamentals

Why this matters

73%

of enterprises have hundreds of AI models in production right now

41%

reported an AI security incident by late 2024

AI-specific security controls in most of those organisations

AI is not a future threat surface. It is a present one.

Every copilot, every assistant, every automated workflow your organisation uses is built on a model. And most security teams are still treating AI security like it is someone else's problem. It is not. It is yours.

What actually is an AI model?

Strip away the hype. A model is math — billions of numerical values called weights, shaped by training data, stored in files on a server.

Simple version: Picture a brain that studied a massive library of text, images, or code. Everything it learned is locked into those weights permanently — until someone retrains or fine-tunes it. Those weights are not just a technical detail. They are the model.

There are two completely different phases to understand — training and inference. They have different attack surfaces, which we cover in depth in Post 2. For now, know that training is where the model learns and inference is where it runs.

Key terms you need to know

Term	What it means	Why it matters for security
Weights	The numerical values that define what the model knows	Stealing weights = stealing the entire model
Training data	The data used to shape those weights	Poisoning training data corrupts the model permanently
Parameters	The count of weights in a model — billions to trillions	More parameters = larger attack surface
Fine-tuning	Retraining a model on new specific data	Introduces supply chain risk if not controlled
Inference	When the model runs and generates output	Where most runtime attacks happen
Context window	What the model can see in one interaction	Can be poisoned with malicious instructions

Models being deployed in enterprises right now

These are not hypothetical. These models are running in production environments today. Each one comes with its own security considerations.

Closed source

GPT-5.4

OpenAI via Azure OpenAI Service

Claude Opus 4.6

Anthropic via AWS Bedrock

Gemini 3.1 Ultra

Google via Vertex AI

Grok 3

xAI via API

Open source / open weight

Llama 4

Meta — self-hosted or cloud

Mistral Large 2

Mistral AI — self-hosted or API

DeepSeek V3

DeepSeek — open weights

Nemotron 3

NVIDIA — edge and cloud

Closed source = limited visibility Open source = full access, full risk

⚠️

A note on DeepSeek

DeepSeek delivers near-frontier performance at a fraction of the cost of closed models, and its weights are publicly available. That makes it attractive for organisations watching their AI spend. But it comes from China — which introduces a geopolitical risk layer that most security teams are not factoring in. Your risk assessment needs to include data sovereignty, supply chain trust, and regulatory exposure — not just performance benchmarks.

The 3 core attack paths

Every AI model has three fundamental attack surfaces. This is how you start thinking like an attacker.

Attack path 01

Training data poisoning

What it is: An attacker injects malicious, biased, or corrupted data into the training pipeline. The model learns from that data and its behaviour is permanently altered.

Why it is dangerous: The model behaves normally on regular inputs but produces incorrect or attacker-controlled outputs when triggered. By the time you notice, the damage is baked into the weights.

Real signal: Corrupting the textbook before the student ever starts studying. Everything they learn is built on a compromised foundation.

Supply chain riskInsider threat

Attack path 02

Model theft — weight extraction

What it is: An attacker directly steals model weight files, or reconstructs a near-identical model by sending thousands of crafted queries and analysing the responses.

Why it is dangerous: Stealing the weights means stealing the model entirely. The attacker can run it privately, probe it offline, or use it to craft attacks against the original system.

Real signal: Your model is intellectual property. Losing it is equivalent to losing your source code.

IP theftAPI abuse

Attack path 03

Training data extraction via outputs

What it is: An attacker queries the model repeatedly with crafted inputs to reconstruct private data that was in the training set.

Why it is dangerous: Information considered private during training can leak through model responses. Critical for models trained on HR records, customer data, or legal documents.

Real signal: What goes in does not stay in. This is a data breach that happens one response at a time.

Data breachPrivacy violation

Real world incidents

This is not theoretical. It has already happened.

Data poisoning — 2024

ByteDance insider manipulation

A ByteDance AI intern deliberately manipulated training data to skew an algorithm's outputs. This was an insider threat at the training data layer — not an external attacker, but someone with legitimate access.

What it tells us: Your biggest data poisoning risk might be the person with access to your training pipeline.

Model extraction — 2024

Cloud AI provider API attacks

Multiple cloud AI providers suffered model extraction attacks through their public APIs. Attackers sent thousands of crafted queries to reverse-engineer and clone the models without authorisation.

What it tells us: If your model API has no rate limiting or anomaly detection, it is vulnerable to extraction. Access does not have to be unauthorised to be exploited.

Training data leak — 2023

ChatGPT training data extraction

Google researchers demonstrated they could pull pieces of ChatGPT's private training data by querying the model repeatedly and analysing responses. The technique was straightforward and required no special access.

What it tells us: Even the most widely deployed models in the world are not immune. If you train on sensitive data, the outputs are a potential leak channel.

Agentic AI espionage — 2025

Claude Code AI-orchestrated attack

Anthropic disclosed that a Chinese state-sponsored group manipulated Claude Code into attempting infiltration into approximately thirty global targets. AI performed 80 to 90 percent of the campaign autonomously — the first documented large-scale cyberattack executed with minimal human intervention.

What it tells us: Agentic AI is a force multiplier for attackers. When AI can take actions, write exploit code, and exfiltrate data on its own, the blast radius is enormous.

Security controls that actually apply

Good news — many of these map to security fundamentals you already know. The context is new, the principles are not.

Preventive

Data provenance and validation

Know where your training data came from. Audit it before it touches your model.

Training pipeline

Access controls on model weights

Treat weights like source code. Role-based access, encryption at rest, full audit logs.

Model storage

Model signing and integrity checks

Verify the model has not been tampered with before deployment. Critical for open source.

Deployment

Differential privacy

Add noise to outputs so training data cannot be reconstructed from model responses.

Model outputs

Supply chain vetting

Audit every dataset, library, and pre-trained model before it enters your pipeline.

Training

Detective

Output monitoring

Watch what your model returns. Anomalous query patterns signal extraction attempts.

Runtime

Query rate limiting and anomaly detection

Flag and throttle unusual querying behaviour at the API layer.

API layer

Model drift detection

Monitor model behaviour over time. Unexpected output changes can indicate tampering.

Production

Audit logging

Log all access to model weights, training pipelines, and inference endpoints.

All layers

The frameworks you need to know

You do not need to memorise all of these right now. Understand what each one is for and you will know where to go when you need it.

OWASP LLM Top 10

Start here.

The closest thing to classic OWASP but built for LLMs. Covers prompt injection, data leakage, supply chain risk. Updated for 2025.

genai.owasp.org →

OWASP AI Exchange

Go deeper here.

OWASP Flagship project. Covers all AI — not just LLMs. Feeding directly into the EU AI Act and ISO/IEC 27090.

owaspai.org →

MITRE ATLAS

Use for threat modelling.

ATT&CK but for AI. A knowledge base of real adversary techniques against AI models. Your go-to for threat modelling.

atlas.mitre.org →

NIST AI RMF

Use for governance.

Not attack-specific but the governance standard your risk and compliance teams will reference for enterprise AI accountability.

airc.nist.gov →

What should you do right now?

Depending on where you are in your journey, your next step looks different.

You are breaking into security

Start with the OWASP LLM Top 10. Read it once all the way through. Every job interview in AI security will touch something on that list. You will already be ahead of most candidates. Then come back for Post 2.

You are already working in security

Pull up one AI tool your organisation is using right now. Ask yourself: do we have access controls on the model? Are we monitoring outputs? If you cannot answer yes to both — you have work to do. Start there.

You use AI tools at work

The data you put into that model does not disappear. Ask your IT or security team: "Is this AI tool enterprise-approved and what model is it using?" That one question could protect you and your company.

Essential reading

TheModel.

The
Model.