REACTOR is the pipeline behind every challenge on LevelUp. Nine agents — Designer, Narrative, Static Analysis, Validator, Exploit, Calibrator, Repair, Deploy, Hint — generate, validate, and calibrate Docker-sandboxed scenarios nightly. Every challenge traces back to the prompt that wrote it.
Runs on SAGE, our open-source agent memory layer.
REACTOR isn’t a black box with “AI” stamped on it. Each agent has a defined responsibility, a memory domain, and a failure mode. Designer → StaticAnalysis → Validator → Calibrator → Deploy, with a Repair branch on stage failure and a DISCARD after two unsuccessful repairs.
Drafts the brief, the Dockerfile, the flag placement. Category-specific system prompt per challenge type.
Stamps the story — breach premise, analyst role, realistic backdrop — so training lands on plausible ground.
Deterministic linter. Rejects missing flags, broken Dockerfiles, generator slop — before anything builds.
Builds the image, brings services up, watches health checks. Kills anything that won’t come up clean.
Type-aware solver: HTTP for web, solver script for crypto, binary exploit for pwn, JS for smart contracts. Proves reachability.
Rule-based + LLM hybrid. Scores difficulty against the target band with confidence intervals.
Pulls institutional memory on failure, patches the specific stage — up to two attempts before DISCARD.
Hardened container ships to the library with stream labels, skill-vector tags, par time baked in.
Skill-vector-aware. Three progressive tiers, confidence-scored, rated by players so we evolve the hint policy itself.
Not an agent in the pipeline itself. A nightly cron that closes the loop — recalibrates from telemetry, mutates stale challenges, evolves prompts, fills coverage gaps, retires drift. See the five loops below.
Every night, the evolution worker runs five loops against the previous day’s telemetry. Challenges drift, prompts age, coverage gaps open, calibrations fall out of their band — these loops close them automatically. They run in order B → A → C → D → E so archival in B opens the gaps D fills, and E evaluates only work that has had time to settle.
Re-score par from the last week of solve data. Promote and demote difficulty bands against real telemetry. Runs first so archival opens the gaps that LOOP D fills in the same run.
Rewrite challenges that everyone solves too fast. Keep the narrative, change the primitives.
Retire archetypes producing boring briefs. Breed better generators against a held-out eval set.
Generate new challenges for under-covered category and ELO cells in the skill-vector grid.
Demote or archive challenges whose calibrated difficulty has drifted far below the band they were generated for. Runs after D so newly generated work isn’t immediately evaluated.
Generator prompt → critic rounds → builder image → validator stages → solver path → deploy hash. Open any challenge in the library and you can walk the chain back to the prompt that wrote it. If a flag ever leaks, every descendant of that prompt is revocable in one command.
That traceability is why REACTOR exists as a pipeline rather than inside a single monolithic LLM call. Memory governance is the difference between “the model produced a challenge” and “we can explain exactly why.”
REACTOR’s most useful downstream job: ingest a published breach report and reconstruct it as a multi-stage scenario your team works in a live Docker sandbox. Days, not weeks.
The current REACTOR pipeline generates challenges from category and skill-vector targets. The next ingestion module extends the Designer agent with a breach-report reader: given a public URL (Bybit, Ronin, Wormhole, MOVEit, SolarWinds), it extracts the attack chain and hands REACTOR a structured brief. Designer drafts the sandbox, Validator exploits it end-to-end, Calibrator sets par time.
The result isn’t a slideshow. It’s a Docker image with the real primitives — a vulnerable contract variant, a staged log bundle, a compromised admin panel — deterministically varied per player so write-ups can’t be Googled halfway through the exercise.
REACTOR runs on SAGE, the open-source memory and consensus layer we released to the community. SAGE is infrastructure. REACTOR is what we built on it.
SAGE is the orchestration substrate underneath REACTOR. BFT consensus commits each agent’s lessons so the next stage can recall them — Repair reads what Validator learned, Calibrator reads what Exploit measured, nothing gets lost between stages. Every commit is Ed25519-signed for lineage and routed by domain so one agent’s topic cannot pollute another’s context. We wrote it, we open-sourced it, anyone can ship on it.
Free for practitioners. Pick a stream, pick a category, solve a box that didn’t exist yesterday. Every solve updates your skill vector; REACTOR queues the next one in your growth zone.
Developer tier is free — 1K API calls a month against REACTOR-generated challenges. Measure your autonomous security agent on fresh, non-leakable content instead of memorised public CTFs.