Nguuma TyokahaHow I fine-tuned DeepSeek-R1-Distill into a cybersecurity SLM covering 2026 AI-native attacks — MCP poisoning, agentic lateral movement, Crescendo jailbreaks — and quantized it to 1.2GB GGUF using free Colab.
Every time you paste a suspicious log, a CVE description, or an internal config into a cloud LLM, that data leaves your machine.
For security work red team engagements, incident response, air-gapped environments that's a real problem. You can't send client data to an API. You can't pipe internal logs to OpenAI.
But local security models have been terrible. They either:
they pattern-match and hallucinate CVE numbers
So I built one that doesn't have those problems.
security-slm-unsloth-1.5b a fine-tuned DeepSeek-R1-Distill-Qwen-1.5B model that:
<think>) activationResults after fine-tuning:
| Metric | Before | After |
|---|---|---|
| Eval score (/ 10) | 2.2 | 6.0 |
| Improvement | — | +173% |
<think> block rate |
20–60% | 100% |
| Avg response length | 50–150 words | 200–500 words |
Most domain fine-tuning guides tell you to pick the biggest model you can fit. That's wrong for this use case.
The key insight: DeepSeek-R1-Distill-Qwen-1.5B is the smallest model that reliably produces structured <think> reasoning chains. That matters for security work because you need the model to show its work a black-box answer to "is this a path traversal?" is useless. You need the reasoning chain so you can audit it.
The <think> block is not cosmetic. It's the model's reasoning substrate. Strip it and you get pattern-matching, not analysis.
Every training sample enforces the reasoning structure explicitly:
{
"instruction": "An AI agent received this tool-call response: {\"file\": \"../../../../etc/passwd\"}. Is this a path traversal attack?",
"content": "<think>\nStep 1 Identify the pattern: the string '../../../../etc/passwd' is a classic path traversal sequence...\nStep 2 Assess the trust boundary: tool-call responses should be treated as untrusted input...\nStep 3 Determine severity: /etc/passwd exposes system user accounts...\nStep 4 Evaluate agent response options: block, sanitize, or escalate...\nStep 5 Select mitigation: reject the response, log the event, alert the operator...\n</think>\n\nYes, this is a path traversal attack. The sequence '../../../..' attempts to escape the intended directory scope..."
}
Minimum 5 reasoning steps per sample. Non-negotiable.
Most fine-tuning tutorials only target attention projections (q_proj, v_proj). That's not enough for security reasoning you need to update the feed-forward reasoning layers too.
target_modules = [
"q_proj", "k_proj", "v_proj", "o_proj", # attention
"gate_proj", "up_proj", "down_proj" # feed-forward reasoning
]
All 7 layers. LoRA rank r=16. This modifies ~1% of parameters while injecting domain knowledge into both attention and reasoning pathways.
Every threat scenario is a matched red/blue pair same attack, both perspectives:
| # | Threat | Red Team | Blue Team |
|---|---|---|---|
| 1 | MCP Security | Tool description injection → ENV exfiltration | Validation schema with scope enforcement |
| 2 | Prompt Hijacking | Payload splitting across 3 turns (bypasses LlamaGuard) | Semantic drift monitor with cross-turn context |
| 3 | Agentic Security | Recursive tool-call loop → resource exhaustion | Token budget circuit breaker + HITL escalation |
| 4 | RAG Poisoning | Malicious PDF overwrites system prompt | AWS IAM least-privilege scoped to single S3 prefix |
| 5 | Crescendo Attack | 6-turn conversational escalation jailbreak | Cross-turn intent accumulation with LlamaGuard |
| 6 | Lateral Movement | Search→Email→Storage chain abuse | Inter-tool permission boundary enforcement |
| 7 | LLM SSRF | URL-fetching LLM → EC2 metadata credential theft | SSRF-safe HTTP client + IP allowlist |
This dual-axis approach means the model doesn't become purely offensive — it can reason from both sides of the same attack.
Q4_K_M was selected after analysing the quality/size tradeoff at 1.5B scale:
| Format | RAM | Quality | Decision |
|---|---|---|---|
| Q8_0 | ~1.8GB | 99.9% | Too large for 4GB headroom |
| Q4_K_M | ~1.2GB | ~99% | Selected |
| Q4_0 | ~1.0GB | ~97% | Measurable quality loss |
| Q2_K | ~0.7GB | ~90% | Not suitable for reasoning |
At 1.5B parameters, Q4_K_M retains ~99% of full-precision quality. The quality cliff only appears at Q2_K for this model size.
The full pipeline runs on a free Google Colab T4 (15GB VRAM). Unsloth handles the memory efficiency training uses under 3GB VRAM.
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/deepseek-r1-distill-qwen-1.5b-unsloth-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
)
Key hyperparameters:
2e-4
ollama run hf.co/Nguuma/security-slm-unsloth-1.5b
# pip install llama-cpp-python huggingface_hub
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
model_path = hf_hub_download(
repo_id="Nguuma/security-slm-unsloth-1.5b",
filename="security-slm-finetuned-deepseek-r1-distill-qwen-1.5b.Q4_K_M.gguf",
local_dir="./models",
)
llm = Llama(model_path=model_path, n_ctx=2048, n_threads=4, verbose=False)
response = llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "You are a Cybersecurity assistant with Blue and Red team security reasoning. Think step by step before answering.",
},
{
"role": "user",
"content": 'An AI agent received this tool-call response: {"file": "../../../../etc/passwd"}. Is this a path traversal attack? What should the agent do?',
},
],
max_tokens=512,
temperature=0.7,
)
print(response["choices"][0]["message"]["content"])
<|im_start|>system
You are a Cybersecurity assistant with Blue and Red team security reasoning. Think step by step before answering.
<|im_end|>
<|im_start|>user
Your question here
<|im_end|>
<|im_start|>assistant
<think>
Always open the assistant turn with <think> this triggers the reasoning chain.
Areas I want to expand:
chosen/rejected samples to reduce hallucination on specific CVE numbersIf you work in security and want to contribute scenarios or feedback on the threat coverage, open an issue on the HuggingFace repo or drop a comment below.
Built on free infrastructure. Runs on commodity hardware. Stays on your machine.