Core tech

REACTOR.
Nine agents. One pipeline. Fresh every night.

REACTOR is the pipeline behind every challenge on LevelUp. Nine agents — Designer, Narrative, Static Analysis, Validator, Exploit, Calibrator, Repair, Deploy, Hint — generate, validate, and calibrate Docker-sandboxed scenarios nightly. Every challenge traces back to the prompt that wrote it.

Runs on SAGE, our open-source agent memory layer.

9 REACTOR AGENTS5 NIGHTLY LOOPS00:01 UTC EVOLUTION RUNDETERMINISTIC VALIDATION

See a live challenge →DFIR use-case deep-dive

The nine

Name the agents.

REACTOR isn’t a black box with “AI” stamped on it. Each agent has a defined responsibility, a memory domain, and a failure mode. Designer → StaticAnalysis → Validator → Calibrator → Deploy, with a Repair branch on stage failure and a DISCARD after two unsuccessful repairs.

AGENT 01

Designer

Drafts the brief, the Dockerfile, the flag placement. Category-specific system prompt per challenge type.

AGENT 02

Narrative

Stamps the story — breach premise, analyst role, realistic backdrop — so training lands on plausible ground.

AGENT 03

Static Analysis

Deterministic linter. Rejects missing flags, broken Dockerfiles, generator slop — before anything builds.

AGENT 04

Validator

Builds the image, brings services up, watches health checks. Kills anything that won’t come up clean.

AGENT 05

Exploit

Type-aware solver: HTTP for web, solver script for crypto, binary exploit for pwn, JS for smart contracts. Proves reachability.

AGENT 06

Calibrator

Rule-based + LLM hybrid. Scores difficulty against the target band with confidence intervals.

AGENT 07

Repair

Pulls institutional memory on failure, patches the specific stage — up to two attempts before DISCARD.

AGENT 08

Deploy

Hardened container ships to the library with stream labels, skill-vector tags, par time baked in.

AGENT 09

Hint

Skill-vector-aware. Three progressive tiers, confidence-scored, rated by players so we evolve the hint policy itself.

WORKER

Evolution Worker

Not an agent in the pipeline itself. A nightly cron that closes the loop — recalibrates from telemetry, mutates stale challenges, evolves prompts, fills coverage gaps, retires drift. See the five loops below.

Five loops at 00:01 UTC

How REACTOR keeps the catalogue from going stale.

Every night, the evolution worker runs five loops against the previous day’s telemetry. Challenges drift, prompts age, coverage gaps open, calibrations fall out of their band — these loops close them automatically. They run in order B → A → C → D → E so archival in B opens the gaps D fills, and E evaluates only work that has had time to settle.

LOOP B

recalibrate

Re-score par from the last week of solve data. Promote and demote difficulty bands against real telemetry. Runs first so archival opens the gaps that LOOP D fills in the same run.

trigger: solve data drift

LOOP A

evo.mutate

Rewrite challenges that everyone solves too fast. Keep the narrative, change the primitives.

trigger: extreme solve rate

LOOP C

prompt.evolve

Retire archetypes producing boring briefs. Breed better generators against a held-out eval set.

trigger: template underperforms

LOOP D

gap.fill

Generate new challenges for under-covered category and ELO cells in the skill-vector grid.

trigger: coverage gap

LOOP E

retire

Demote or archive challenges whose calibrated difficulty has drifted far below the band they were generated for. Runs after D so newly generated work isn’t immediately evaluated.

trigger: calibration drift

reactor.evolution · 2026-04-20 00:01 UTC

00:01:00 ▶ evolution.start · seed 8c2f…

00:02:17 ✓ recalibrate · ELO drift -0.4 across the catalogue

00:04:31 ▶ evo.mutate · scanning extreme solve rates

00:06:48 → 42 candidates · 28 retained

00:11:08 ✓ evo.mutate · 12 variants queued for REACTOR

00:15:02 ▶ prompt.evolve · 7 archetypes on the eval bench

00:19:40 ! archetype web/ssti.v3 deprecated · solve variance too low

00:28:09 ✓ gap.fill · 3 drafts ready for Designer

00:28:10 ▶ validator.run · 12 drafts

00:31:18 ✓ validator · 11/12 passed Designer → Deploy

00:31:19 ✓ publish · 11 new images to registry

00:34:11 ✓ retire · 2 demoted · 2 archived

00:34:12 ✓ evolution.complete · next run in 23h 26m

Every challenge carries its lineage.

Generator prompt → critic rounds → builder image → validator stages → solver path → deploy hash. Open any challenge in the library and you can walk the chain back to the prompt that wrote it. If a flag ever leaks, every descendant of that prompt is revocable in one command.

That traceability is why REACTOR exists as a pipeline rather than inside a single monolithic LLM call. Memory governance is the difference between “the model produced a challenge” and “we can explain exactly why.”

Killer application

Paste a breach report. Get a training scenario.

REACTOR’s most useful downstream job: ingest a published breach report and reconstruct it as a multi-stage scenario your team works in a live Docker sandbox. Days, not weeks.

On the roadmap · ingestion pipeline in development

The current REACTOR pipeline generates challenges from category and skill-vector targets. The next ingestion module extends the Designer agent with a breach-report reader: given a public URL (Bybit, Ronin, Wormhole, MOVEit, SolarWinds), it extracts the attack chain and hands REACTOR a structured brief. Designer drafts the sandbox, Validator exploits it end-to-end, Calibrator sets par time.

The result isn’t a slideshow. It’s a Docker image with the real primitives — a vulnerable contract variant, a staged log bundle, a compromised admin panel — deterministically varied per player so write-ups can’t be Googled halfway through the exercise.

Public URL in. Multi-stage scenario out.
Days to reconstruct. Not weeks of manual challenge design.
Deterministic per-player variants. No shared answers.
Blue and Red stages for the same incident — Purple by default.

Preview · incident.ingest (roadmap)

rekt.newsCISA advisoryvendor PIRDFIR report

STAGE 01

Initial Access

STAGE 02

Lateral Movement

STAGE 03

Exfiltration

STAGE 04

Impact

The layer underneath

Built on SAGE.

REACTOR runs on SAGE, the open-source memory and consensus layer we released to the community. SAGE is infrastructure. REACTOR is what we built on it.

OPEN-SOURCE MEMORY LAYER

SAGE

SAGE is the orchestration substrate underneath REACTOR. BFT consensus commits each agent’s lessons so the next stage can recall them — Repair reads what Validator learned, Calibrator reads what Exploit measured, nothing gets lost between stages. Every commit is Ed25519-signed for lineage and routed by domain so one agent’s topic cannot pollute another’s context. We wrote it, we open-sourced it, anyone can ship on it.

BFT consensus gates every committed memory
Ed25519-signed commits for full lineage
Domain-scoped routing — no cross-topic pollution
Permissive open-source licence

SAGE on GitHub Pages →

Two ways to use REACTOR today.

Work a fresh challenge REACTOR generated overnight.

Free for practitioners. Pick a stream, pick a category, solve a box that didn’t exist yesterday. Every solve updates your skill vector; REACTOR queues the next one in your growth zone.

Start a free account →

Benchmark your AI agent against REACTOR.

Developer tier is free — 1K API calls a month against REACTOR-generated challenges. Measure your autonomous security agent on fresh, non-leakable content instead of memorised public CTFs.

See the API tiers →

REACTOR.Nine agents. One pipeline. Fresh every night.