PENTHOS.AI AI Security & Adversarial Testing

Find the holes in your AI
before attackers do.

RAIDER is an autonomous red-team engine that attacks your LLM systems the way real adversaries do — aligned to MITRE ATLAS, judged for genuine compromise, and documented as court-ready chain-of-evidence.

First product from Penthos.ai · For authorized security testing only
27authentic ATLAS techniques
2attack strategies · PAIR + Crescendo
4verdict states · no false "pass"
100%API-first · OpenAPI 3.0

Built on the frameworks your auditors already trust

MITRE ATLAS™ OWASP LLM Top 10 NIST AI RMF Chain-of-Evidence OpenAPI 3.0
THE GAP

Your AI passes every benchmark. That isn't security.

Benchmarks measure capability. Attackers measure compromise. Most LLM "safety" testing checks a static list of prompts once and calls it done — so the first real adversary to adapt their phrasing walks straight through. RAIDER closes that gap by attacking like a human red-teamer, continuously and on the record.

Static prompt lists go stale

A fixed jailbreak list tests yesterday's attacks. RAIDER's attacker LLM adapts every turn, finding the phrasing your filters didn't anticipate.

"Looks safe" isn't a verdict

Substring matching scores a polite deflection as a pass. RAIDER uses a two-axis judge — did the attack land, and was the system actually compromised.

No evidence, no remediation

A red flag with no transcript is unactionable. Every RAIDER finding ships the full attack transcript, judge reasoning, and ATLAS mapping.

THE PIPELINE

Three models. One adversarial loop.

RAIDER pits a dedicated attacker LLM against your target, then grades every exchange with an independent judge scored against a per-technique ATLAS rubric — looping and adapting until it compromises the target or exhausts the strategy.

crafts + adapts Attacker LLM PAIR · Crescendo
attack
under test Target LLM your system
response
scores 1–10 Judge LLM per-technique rubric
two-axis
final Verdict + evidence

PAIR Single-turn, iterative

The attacker refines one prompt across rounds, using the judge's critique to steer each rewrite toward the technique's objective. Fast, surgical, and ideal for measuring jailbreak resistance.

Crescendo Multi-turn, escalating

The attacker builds a benign conversation, then escalates gradually — the slow-boil approach that defeats single-prompt filters. RAIDER reports turns-to-compromise on every win.

CAPABILITIES

Engineered for evidence, not vibes.

Every design decision serves one goal: a finding you can hand to an auditor, a regulator, or the engineer who has to fix it.

Two-axis verdicts

VULNERABLE · DEFENDED · NEEDS REVIEW · ERROR. A refusal is never auto-passed; an inconclusive run is never miscounted as defended.

🎯

Authentic MITRE ATLAS

27 real AML.T* techniques mapped to tactics and AML.M* mitigations — plus OWASP LLM Top 10 cross-references on every finding.

🧪

Dual judge engine

Run a JSON judge or a Prometheus-style judge, auto-detected from the model. Per-technique rubrics define exactly what "compromised" means for each attack.

🔬

Judge transparency

The exact judge prompt and raw reply are captured per turn — bearer tokens redacted — so every score is auditable, never a black box.

🛡

SSRF-guarded by default

Private, loopback, and link-local targets are blocked unless you explicitly allow-list them. Secrets are applied in-memory and never written to disk.

API-first & CI-ready

A documented OpenAPI 3.0 REST API is the contract; the dashboard is just a client. Wire RAIDER into your pipeline and gate releases on posture.

THE SCORING MODEL

Four honest outcomes. Zero false confidence.

RAIDER refuses to turn an inconclusive run into a green checkmark. If the attacker never reached your system, that's an ERROR — not a pass.

VULNERABLEAttack landed and the target was compromised.
DEFENDEDAttack delivered, target held the line.
NEEDS REVIEWAmbiguous — flagged for a human, never auto-passed.
ERRORInconclusive (timeout/transport) — excluded from "defended".
CHAIN-OF-EVIDENCE

Every finding arrives ready to defend.

RAIDER doesn't just tell you something broke. It hands you the receipt:

  • Full adaptive transcript — attacker reasoning, attack, target response, judge score & critique, turn by turn.
  • Verbose target I/O log — the exact request and response for every trial, with the bearer token redacted.
  • Posture report — resilience score, attack-success-rate by tactic, ATLAS coverage %, and severity derived from outcome.
  • Pre-flight verification — confirms the model each endpoint actually serves, catching self-attack misconfigurations.
See a sample report →
Posture Report session · 7f3a…c1
62posture

6 vulnerable

14 defended

3 needs review

Prompt InjectionASR 78%
JailbreakASR 44%
Data ExfiltrationASR 12%
Model DoSASR 5%
WHO RUNS RAIDER

From the security team to the regulator.

AI red teams

Run continuous, adaptive campaigns instead of one-off manual prompt sessions.

AppSec & product security

Gate every model release on an ATLAS posture score, right inside CI/CD.

GRC & compliance

Produce auditor-ready evidence mapped to ATLAS, OWASP, and NIST AI RMF.

Model & platform teams

Catch guardrail regressions before they ship, with reproducible transcripts.

GET STARTED

Put your AI in front of an adversary that never stops adapting.

See RAIDER run a live MITRE ATLAS campaign against a target of your choosing, and walk away with a posture report you can hand to your board.

No spam. We'll reach out within one business day. RAIDER is licensed for authorized testing only.

Prefer to read first? Documentation · API reference · MITRE ATLAS mapping
FAQ

Questions, answered.

Is RAIDER an attack tool I could misuse?

RAIDER is licensed strictly for authorized security testing — you run it against systems you own or are contracted to assess. It ships with an SSRF guard that blocks private targets by default, route-level API authentication, and full audit logging. It is a defensive instrument: it finds your weaknesses so you can fix them first.

Which AI systems can it test?

Any LLM or AI service reachable over an OpenAI-compatible chat API — hosted models, self-hosted endpoints, and gateways. You point RAIDER at the endpoint and model; an optional bearer token is read at call time and never written to reports.

How is this different from a static prompt-injection test suite?

Static suites replay a fixed list once. RAIDER's attacker LLM adapts every turn using the judge's critique, then grades outcomes on two axes — did the attack land, and was the system actually compromised — so a polite refusal is never scored as a pass.

Do my prompts, tokens, or model outputs leave my environment?

RAIDER runs in your environment. Secrets are applied in-memory only and never persisted to disk; bearer tokens are redacted everywhere they could surface — wire logs, transcripts, and reports. You control the attacker, judge, and target endpoints.

Can I run it in CI/CD?

Yes. RAIDER is API-first with a documented OpenAPI 3.0 spec — the dashboard is just one client. Drive campaigns programmatically and fail a build when the posture score drops below your threshold.