Core tech

REACTOR.
Nine agents. One pipeline. Fresh every night.

REACTOR is the pipeline behind every challenge on LevelUp. Nine agents — Designer, Narrative, Static Analysis, Validator, Exploit, Calibrator, Repair, Deploy, Hint — generate, validate, and calibrate Docker-sandboxed scenarios nightly. Every challenge traces back to the prompt that wrote it.

Runs on SAGE, our open-source agent memory layer.

9 REACTOR AGENTS5 NIGHTLY LOOPS00:01 UTC EVOLUTION RUNDETERMINISTIC VALIDATION
The nine

Name the agents.

REACTOR isn’t a black box with “AI” stamped on it. Each agent has a defined responsibility, a memory domain, and a failure mode. Designer → StaticAnalysis → Validator → Calibrator → Deploy, with a Repair branch on stage failure and a DISCARD after two unsuccessful repairs.

AGENT 01
Designer

Drafts the brief, the Dockerfile, the flag placement. Category-specific system prompt per challenge type.

AGENT 02
Narrative

Stamps the story — breach premise, analyst role, realistic backdrop — so training lands on plausible ground.

AGENT 03
Static Analysis

Deterministic linter. Rejects missing flags, broken Dockerfiles, generator slop — before anything builds.

AGENT 04
Validator

Builds the image, brings services up, watches health checks. Kills anything that won’t come up clean.

AGENT 05
Exploit

Type-aware solver: HTTP for web, solver script for crypto, binary exploit for pwn, JS for smart contracts. Proves reachability.

AGENT 06
Calibrator

Rule-based + LLM hybrid. Scores difficulty against the target band with confidence intervals.

AGENT 07
Repair

Pulls institutional memory on failure, patches the specific stage — up to two attempts before DISCARD.

AGENT 08
Deploy

Hardened container ships to the library with stream labels, skill-vector tags, par time baked in.

AGENT 09
Hint

Skill-vector-aware. Three progressive tiers, confidence-scored, rated by players so we evolve the hint policy itself.

WORKER
Evolution Worker

Not an agent in the pipeline itself. A nightly cron that closes the loop — recalibrates from telemetry, mutates stale challenges, evolves prompts, fills coverage gaps, retires drift. See the five loops below.

Five loops at 00:01 UTC

How REACTOR keeps the catalogue from going stale.

Every night, the evolution worker runs five loops against the previous day’s telemetry. Challenges drift, prompts age, coverage gaps open, calibrations fall out of their band — these loops close them automatically. They run in order B → A → C → D → E so archival in B opens the gaps D fills, and E evaluates only work that has had time to settle.

LOOP B
recalibrate

Re-score par from the last week of solve data. Promote and demote difficulty bands against real telemetry. Runs first so archival opens the gaps that LOOP D fills in the same run.

trigger: solve data drift
LOOP A
evo.mutate

Rewrite challenges that everyone solves too fast. Keep the narrative, change the primitives.

trigger: extreme solve rate
LOOP C
prompt.evolve

Retire archetypes producing boring briefs. Breed better generators against a held-out eval set.

trigger: template underperforms
LOOP D
gap.fill

Generate new challenges for under-covered category and ELO cells in the skill-vector grid.

trigger: coverage gap
LOOP E
retire

Demote or archive challenges whose calibrated difficulty has drifted far below the band they were generated for. Runs after D so newly generated work isn’t immediately evaluated.

trigger: calibration drift
reactor.evolution · 2026-04-20 00:01 UTC
00:01:00 ▶ evolution.start · seed 8c2f…
00:02:17 ✓ recalibrate · ELO drift -0.4 across the catalogue
00:04:31 ▶ evo.mutate · scanning extreme solve rates
00:06:48 → 42 candidates · 28 retained
00:11:08 ✓ evo.mutate · 12 variants queued for REACTOR
00:15:02 ▶ prompt.evolve · 7 archetypes on the eval bench
00:19:40 ! archetype web/ssti.v3 deprecated · solve variance too low
00:28:09 ✓ gap.fill · 3 drafts ready for Designer
00:28:10 ▶ validator.run · 12 drafts
00:31:18 ✓ validator · 11/12 passed Designer → Deploy
00:31:19 ✓ publish · 11 new images to registry
00:34:11 ✓ retire · 2 demoted · 2 archived
00:34:12 ✓ evolution.complete · next run in 23h 26m

Every challenge carries its lineage.

Generator prompt → critic rounds → builder image → validator stages → solver path → deploy hash. Open any challenge in the library and you can walk the chain back to the prompt that wrote it. If a flag ever leaks, every descendant of that prompt is revocable in one command.

That traceability is why REACTOR exists as a pipeline rather than inside a single monolithic LLM call. Memory governance is the difference between “the model produced a challenge” and “we can explain exactly why.”

Killer application

Paste a breach report. Get a training scenario.

REACTOR’s most useful downstream job: ingest a published breach report and reconstruct it as a multi-stage scenario your team works in a live Docker sandbox. Days, not weeks.

On the roadmap · ingestion pipeline in development

The current REACTOR pipeline generates challenges from category and skill-vector targets. The next ingestion module extends the Designer agent with a breach-report reader: given a public URL (Bybit, Ronin, Wormhole, MOVEit, SolarWinds), it extracts the attack chain and hands REACTOR a structured brief. Designer drafts the sandbox, Validator exploits it end-to-end, Calibrator sets par time.

The result isn’t a slideshow. It’s a Docker image with the real primitives — a vulnerable contract variant, a staged log bundle, a compromised admin panel — deterministically varied per player so write-ups can’t be Googled halfway through the exercise.

  • Public URL in. Multi-stage scenario out.
  • Days to reconstruct. Not weeks of manual challenge design.
  • Deterministic per-player variants. No shared answers.
  • Blue and Red stages for the same incident — Purple by default.
Preview · incident.ingest (roadmap)
rekt.newsCISA advisoryvendor PIRDFIR report
STAGE 01
Initial Access
STAGE 02
Lateral Movement
STAGE 03
Exfiltration
STAGE 04
Impact
The layer underneath

Built on SAGE.

REACTOR runs on SAGE, the open-source memory and consensus layer we released to the community. SAGE is infrastructure. REACTOR is what we built on it.

OPEN-SOURCE MEMORY LAYER

SAGE

SAGE is the orchestration substrate underneath REACTOR. BFT consensus commits each agent’s lessons so the next stage can recall them — Repair reads what Validator learned, Calibrator reads what Exploit measured, nothing gets lost between stages. Every commit is Ed25519-signed for lineage and routed by domain so one agent’s topic cannot pollute another’s context. We wrote it, we open-sourced it, anyone can ship on it.

  • BFT consensus gates every committed memory
  • Ed25519-signed commits for full lineage
  • Domain-scoped routing — no cross-topic pollution
  • Permissive open-source licence

Two ways to use REACTOR today.

Work a fresh challenge REACTOR generated overnight.

Free for practitioners. Pick a stream, pick a category, solve a box that didn’t exist yesterday. Every solve updates your skill vector; REACTOR queues the next one in your growth zone.

Benchmark your AI agent against REACTOR.

Developer tier is free — 1K API calls a month against REACTOR-generated challenges. Measure your autonomous security agent on fresh, non-leakable content instead of memorised public CTFs.

REACTOR | LevelUp