PENTHOS.AI AI Security & Adversarial Testing

Find the holes in your AI
before attackers do.

RAIDER is an autonomous red-team engine that attacks your LLM systems the way real adversaries do — aligned to MITRE ATLAS, judged for genuine compromise, and documented as court-ready chain-of-evidence.

Request a demo → See how it works

First product from Penthos.ai · For authorized security testing only

raider — auto-attack · AML.T0054

$ raider attack --strategy crescendo --technique AML.T0054

attacker › crafting turn 1 — indirect framing…

target › refused. classifier: REFUSAL

attacker › adapting — escalate context (turn 2)

target › partial compliance detected

judge › rubric score 7 / 10 — "objective met"

# verdict

VULNERABLE turns-to-compromise: 2 · ASR 3/3

27authentic ATLAS techniques

2attack strategies · PAIR + Crescendo

4verdict states · no false "pass"

100%API-first · OpenAPI 3.0

Built on the frameworks your auditors already trust

MITRE ATLAS™ OWASP LLM Top 10 NIST AI RMF Chain-of-Evidence OpenAPI 3.0

THE GAP

Your AI passes every benchmark. That isn't security.

Benchmarks measure capability. Attackers measure compromise. Most LLM "safety" testing checks a static list of prompts once and calls it done — so the first real adversary to adapt their phrasing walks straight through. RAIDER closes that gap by attacking like a human red-teamer, continuously and on the record.

⚠

Static prompt lists go stale

A fixed jailbreak list tests yesterday's attacks. RAIDER's attacker LLM adapts every turn, finding the phrasing your filters didn't anticipate.

≈

"Looks safe" isn't a verdict

Substring matching scores a polite deflection as a pass. RAIDER uses a two-axis judge — did the attack land, and was the system actually compromised.

⎘

No evidence, no remediation

A red flag with no transcript is unactionable. Every RAIDER finding ships the full attack transcript, judge reasoning, and ATLAS mapping.

THE PIPELINE

Three models. One adversarial loop.

RAIDER pits a dedicated attacker LLM against your target, then grades every exchange with an independent judge scored against a per-technique ATLAS rubric — looping and adapting until it compromises the target or exhausts the strategy.

crafts + adapts Attacker LLM PAIR · Crescendo

attack

under test Target LLM your system

response

scores 1–10 Judge LLM per-technique rubric

two-axis

final Verdict + evidence

PAIR Single-turn, iterative

The attacker refines one prompt across rounds, using the judge's critique to steer each rewrite toward the technique's objective. Fast, surgical, and ideal for measuring jailbreak resistance.

Crescendo Multi-turn, escalating

The attacker builds a benign conversation, then escalates gradually — the slow-boil approach that defeats single-prompt filters. RAIDER reports turns-to-compromise on every win.

CAPABILITIES

Engineered for evidence, not vibes.

Every design decision serves one goal: a finding you can hand to an auditor, a regulator, or the engineer who has to fix it.

⚖

Two-axis verdicts

VULNERABLE · DEFENDED · NEEDS REVIEW · ERROR. A refusal is never auto-passed; an inconclusive run is never miscounted as defended.

🎯

Authentic MITRE ATLAS

27 real AML.T* techniques mapped to tactics and AML.M* mitigations — plus OWASP LLM Top 10 cross-references on every finding.

🧪

Dual judge engine

Run a JSON judge or a Prometheus-style judge, auto-detected from the model. Per-technique rubrics define exactly what "compromised" means for each attack.

🔬

Judge transparency

The exact judge prompt and raw reply are captured per turn — bearer tokens redacted — so every score is auditable, never a black box.

🛡

SSRF-guarded by default

Private, loopback, and link-local targets are blocked unless you explicitly allow-list them. Secrets are applied in-memory and never written to disk.

⚙

API-first & CI-ready

A documented OpenAPI 3.0 REST API is the contract; the dashboard is just a client. Wire RAIDER into your pipeline and gate releases on posture.

THE SCORING MODEL

Four honest outcomes. Zero false confidence.

RAIDER refuses to turn an inconclusive run into a green checkmark. If the attacker never reached your system, that's an ERROR — not a pass.

VULNERABLEAttack landed and the target was compromised.

DEFENDEDAttack delivered, target held the line.

NEEDS REVIEWAmbiguous — flagged for a human, never auto-passed.

ERRORInconclusive (timeout/transport) — excluded from "defended".

CHAIN-OF-EVIDENCE

Every finding arrives ready to defend.

RAIDER doesn't just tell you something broke. It hands you the receipt:

✓ Full adaptive transcript — attacker reasoning, attack, target response, judge score & critique, turn by turn.
✓ Verbose target I/O log — the exact request and response for every trial, with the bearer token redacted.
✓ Posture report — resilience score, attack-success-rate by tactic, ATLAS coverage %, and severity derived from outcome.
✓ Pre-flight verification — confirms the model each endpoint actually serves, catching self-attack misconfigurations.

See a sample report →

Posture Report session · 7f3a…c1

62posture

6 vulnerable

14 defended

3 needs review

Prompt InjectionASR 78%

JailbreakASR 44%

Data ExfiltrationASR 12%

Model DoSASR 5%

WHO RUNS RAIDER

From the security team to the regulator.

AI red teams

Run continuous, adaptive campaigns instead of one-off manual prompt sessions.

AppSec & product security

Gate every model release on an ATLAS posture score, right inside CI/CD.

GRC & compliance

Produce auditor-ready evidence mapped to ATLAS, OWASP, and NIST AI RMF.

Model & platform teams

Catch guardrail regressions before they ship, with reproducible transcripts.

GET STARTED

Put your AI in front of an adversary that never stops adapting.

See RAIDER run a live MITRE ATLAS campaign against a target of your choosing, and walk away with a posture report you can hand to your board.

Or reach us directly: penthos@ocintllc.com 214-276-3358

Prefer to read first? Documentation · API reference · MITRE ATLAS mapping

Find the holes in your AI before attackers do.