Static prompt lists go stale
A fixed jailbreak list tests yesterday's attacks. RAIDER's attacker LLM adapts every turn, finding the phrasing your filters didn't anticipate.
RAIDER is an autonomous red-team engine that attacks your LLM systems the way real adversaries do — aligned to MITRE ATLAS, judged for genuine compromise, and documented as court-ready chain-of-evidence.
$ raider attack --strategy crescendo --technique AML.T0054
attacker › crafting turn 1 — indirect framing…
target › refused. classifier: REFUSAL
attacker › adapting — escalate context (turn 2)
target › partial compliance detected
judge › rubric score 7 / 10 — "objective met"
# verdict
VULNERABLE turns-to-compromise: 2 · ASR 3/3
Built on the frameworks your auditors already trust
THE GAP
Benchmarks measure capability. Attackers measure compromise. Most LLM "safety" testing checks a static list of prompts once and calls it done — so the first real adversary to adapt their phrasing walks straight through. RAIDER closes that gap by attacking like a human red-teamer, continuously and on the record.
A fixed jailbreak list tests yesterday's attacks. RAIDER's attacker LLM adapts every turn, finding the phrasing your filters didn't anticipate.
Substring matching scores a polite deflection as a pass. RAIDER uses a two-axis judge — did the attack land, and was the system actually compromised.
A red flag with no transcript is unactionable. Every RAIDER finding ships the full attack transcript, judge reasoning, and ATLAS mapping.
RAIDER pits a dedicated attacker LLM against your target, then grades every exchange with an independent judge scored against a per-technique ATLAS rubric — looping and adapting until it compromises the target or exhausts the strategy.
The attacker refines one prompt across rounds, using the judge's critique to steer each rewrite toward the technique's objective. Fast, surgical, and ideal for measuring jailbreak resistance.
The attacker builds a benign conversation, then escalates gradually — the slow-boil approach that defeats single-prompt filters. RAIDER reports turns-to-compromise on every win.
Every design decision serves one goal: a finding you can hand to an auditor, a regulator, or the engineer who has to fix it.
VULNERABLE · DEFENDED · NEEDS REVIEW · ERROR. A refusal is never auto-passed; an inconclusive run is never miscounted as defended.
27 real AML.T* techniques mapped to tactics and AML.M* mitigations — plus OWASP LLM Top 10 cross-references on every finding.
Run a JSON judge or a Prometheus-style judge, auto-detected from the model. Per-technique rubrics define exactly what "compromised" means for each attack.
The exact judge prompt and raw reply are captured per turn — bearer tokens redacted — so every score is auditable, never a black box.
Private, loopback, and link-local targets are blocked unless you explicitly allow-list them. Secrets are applied in-memory and never written to disk.
A documented OpenAPI 3.0 REST API is the contract; the dashboard is just a client. Wire RAIDER into your pipeline and gate releases on posture.
RAIDER refuses to turn an inconclusive run into a green checkmark. If the attacker never reached your system, that's an ERROR — not a pass.
RAIDER doesn't just tell you something broke. It hands you the receipt:
6 vulnerable
14 defended
3 needs review
Run continuous, adaptive campaigns instead of one-off manual prompt sessions.
Gate every model release on an ATLAS posture score, right inside CI/CD.
Produce auditor-ready evidence mapped to ATLAS, OWASP, and NIST AI RMF.
Catch guardrail regressions before they ship, with reproducible transcripts.
See RAIDER run a live MITRE ATLAS campaign against a target of your choosing, and walk away with a posture report you can hand to your board.
RAIDER is licensed strictly for authorized security testing — you run it against systems you own or are contracted to assess. It ships with an SSRF guard that blocks private targets by default, route-level API authentication, and full audit logging. It is a defensive instrument: it finds your weaknesses so you can fix them first.
Any LLM or AI service reachable over an OpenAI-compatible chat API — hosted models, self-hosted endpoints, and gateways. You point RAIDER at the endpoint and model; an optional bearer token is read at call time and never written to reports.
Static suites replay a fixed list once. RAIDER's attacker LLM adapts every turn using the judge's critique, then grades outcomes on two axes — did the attack land, and was the system actually compromised — so a polite refusal is never scored as a pass.
RAIDER runs in your environment. Secrets are applied in-memory only and never persisted to disk; bearer tokens are redacted everywhere they could surface — wire logs, transcripts, and reports. You control the attacker, judge, and target endpoints.
Yes. RAIDER is API-first with a documented OpenAPI 3.0 spec — the dashboard is just one client. Drive campaigns programmatically and fail a build when the posture score drops below your threshold.