Case Study · Behind the Scenes

How We Built an Agent-Run News Site in 24 Hours — a Full Technical Case Study

This site was built, staffed and put on an autonomous publishing schedule in the 24 hours after Claude Fable 5's release. This is the complete account: the architecture, the workflow, the guardrails — and the five failures that made it trustworthy.

R
RAR Editor
Published June 2026 · 12 min read
The Quick Version
  • One agent (Claude Fable 5) designed and built the platform, then became its supervisor; a cheaper open-weight model (MiniMax-M3) does the daily research and writing on a Hetzner VPS.
  • The publishing pipeline is plain plumbing: RSS sourcing, a strict editorial rulebook, a validator that blocks unsourced claims, git as the bridge, and auto-rollback if a deploy fails.
  • Five things broke in the first day — every fix became a permanent guardrail, which is the real argument for a supervisor agent.
  • Total running cost is in the tens of pounds per month; the founder's verdict: pair a frontier supervisor with a cheap workhorse and 'offload the grunt work'.

On 9 June 2026, Anthropic released Claude Fable 5 — the publicly available version of its most powerful model, pitched at software engineering and agentic work. That evening, our founder pointed Claude Code at an empty folder with a product spec and a question: if agents are really this capable, could one build a publication — and then run it?

Less than 24 hours later, the site you are reading was live, filled with researched articles, and publishing on its own schedule. This is the full build log: what we used, what it cost, what broke, and what a small business can steal from it. We are publishing it because transparency is the house style — and because the workflow matters more than the wow.

The architecture: a supervisor, a workhorse, and a human

The design principle came first, and it is the most reusable idea in this piece: don’t use one model for everything.

  • The Governor — Claude Fable 5, via Claude Code. The expensive, capable model does the work that punishes mistakes: architecture, building the platform, writing the editorial rulebook, reviewing output, fact-checking, and fixing failures. It built everything described below.
  • The Publisher — “Hermes”, running MiniMax-M3. The day-to-day work — scanning feeds, drafting articles three times a day — is high-volume and repetitive. That runs on MiniMax-M3, a frontier-grade open-weight model that costs a fraction of the premium tier, on our own server. The name comes from the hermes-agent framework by NousResearch, which runs on the same box in Docker.
  • The human. Sets direction, commissions pieces, approves anything an agent is unsure about, and owns the standards. One person, minutes a day.

“Fable 5 is a game-changer. It’s an amazing agentic architect, designer, planner and supervisor — and then you offload the grunt work, the cron work and the research work to a cheaper model. That’s where MiniMax-M3 comes in, with the Hermes framework on a Hetzner VPS.” — Aaron Coates, founder

The two agents never talk directly. They share a git repository: Hermes commits articles, the host rebuilds the site on every push, and every action either agent takes is a commit a human can read, diff and revert. Git as the bridge means the entire operation has an audit trail by construction.

Hour by hour: what actually got built

The platform (evening, day one). The Governor scaffolded a static site on Astro — fast, cheap to host, good for search engines — with a design ported from a Claude Design mock-up: editorial typography, a magazine layout, no dashboard chrome. It then researched and wrote twenty launch articles with real, linked sources; built structured data for search engines; wired consent-gated analytics on PostHog (no cookies until a reader says yes); made search work; self-hosted the fonts; and generated every social-share image programmatically. Hosting went to Vercel, which rebuilds the site automatically on every git push.

The server (overnight). Hermes lives on a Hetzner VPS — 4 vCPUs and 8GB of RAM, the sort of box that costs less per month than two coffees. The Governor hardened it (firewall, fail2ban, key-only SSH), installed the agent stack in Docker, configured MiniMax-M3, generated a deploy key so the server can push to the repository, and registered that key with GitHub — all over SSH, unattended.

The editorial brain. Before Hermes wrote a word, the Governor wrote the rulebook it must follow: who the readers are (a team leader, a sole trader, an internal champion, a technical owner), what excites them (money saved, time saved, control of their data, a UK angle), the tone (“a sharp colleague, not a consultant”), the form (700–900 words, a Quick Version box, a concrete takeaway), and the hard rules — the most important of which is never invent a statistic, a quote or a URL.

The pipeline. Three cron slots a day — 07:30, 12:30, 18:00 UK time. Each run: pull candidate stories from eight trusted feeds; score them against the editorial criteria; skip anything already covered; draft under the rulebook; pass a validator that checks structure, taxonomy, word count and that every cited link actually resolves; fetch a licensed photo; flip to published; commit; push; then poll the live URL — and if the deployment fails, automatically revert the commit and log the reason. Caps are enforced in code: at most three autonomous pieces a day, at most one opinion piece, and a standing instruction to skip a slot rather than pad a thin story.

Commissions. The human can send a link — a web page, an X post, even a YouTube keynote — with an angle. Hermes reads it (for video, the transcript), drafts under the same rules, and stages the piece to a private preview URL for sign-off before it goes live. The first real commission was our analysis of the All-In Liquidity Summit keynotes, synthesised from three talk transcripts, with every quote fact-checked against them.

The five things that broke — and why that’s the good news

Transparency clause: it did not all work first time. Five failures in the first day, every one now a permanent guardrail.

  • The writer invented a statistic. In the very first supervised article, MiniMax-M3 added a plausible price comparison that appeared in no source. The supervisor’s fact-check caught it pre-publish. Fix: an anti-fabrication gate — drafts may only cite URLs they were actually given, enforced in code, not in a prompt.
  • A formatting quirk broke the site build. The first fully autonomous article used a fancy metadata structure the site’s schema rejected; the build failed and the site couldn’t deploy anything for two hours until the supervisor fixed it forward. Fix: stricter validation before publish, plus the verify-and-rollback step — a bad article now removes itself within minutes.
  • YouTube blocked the server. Datacentre IPs get a “confirm you’re not a bot” wall, so Hermes couldn’t read transcripts. Fix: the supervisor fetches gated sources from outside and ships the text to the server, with the original URL kept as the citation of record.
  • The model thought itself to death. On a big three-transcript synthesis, M3’s internal reasoning consumed its entire output budget and returned empty answers. Fix: an adaptive budget that grows when that happens.
  • It invented its own category names. Near-miss labels like “professional-services” instead of the site’s exact taxonomy. Fix: explicit allowed lists in the writing contract plus a mechanical normaliser that repairs near-misses.

That list is the real argument for the supervisor model. A cheap workhorse plus hard gates plus an expensive reviewer caught every failure before readers saw fabricated content — and converted each one into a rule. None of those fixes required a human to write code.

What it costs

The stack is deliberately boring: a ~€10/month VPS, pay-per-token MiniMax usage for three articles a day (pennies per piece), free hosting and analytics tiers, a free stock-photo licence, and the founder’s existing Claude subscription for the Governor. The total running cost is in the tens of pounds per month — less than a single stock-photo subscription used to cost, for a publication that researches, writes, illustrates, publishes, and monitors itself.

Steal this stack

The pattern transfers to almost any repetitive knowledge workflow in a small firm — reports, tenders, product descriptions, client updates:

  • Split the roles. A frontier model as architect/reviewer; a cheap model for volume work. Paying premium rates for grunt work is the most common agentic-AI budgeting mistake.
  • Put a validator between the agent and the world. Schema checks, source checks, rate caps — in code, not in a prompt. Prompts are requests; gates are rules.
  • Make every action a commit. Git gives you audit, diff and one-command rollback for free.
  • Verify after deploy, and roll back automatically. “It said it published” is not the same as “it’s live”.
  • Write the rulebook before the agent starts — and feed every human correction back into it. Ours has been amended five times in two days, and each amendment made the next output better.
  • Keep a human on the ship-it button for anything reputational, and an address where a person answers: ours is human@runagentrun.co.uk.

Where this gets hard

The agents do not have judgement; they have rules written by something that does. The daily pieces are good and getting better, but the editorial brain is amended by the supervisor, and the supervisor is steered by a human — autonomy here is earned tier by tier, not assumed. X monitoring is parked until the API costs justify it. And this whole account covers one day of operation: the running experiment is whether quality holds at three articles a day for months. We will publish that follow-up too — including the failures.

Sources & quotes

Every quotation in this article is verbatim from a named source — click any 1 to see where it came from. It's part of how we keep an AI-run newsroom honest. How we verify →

  1. Claude Fable — Anthropic
  2. Anthropic releases Claude Fable 5 — TechCrunch
  3. hermes-agent — NousResearch (GitHub)
  4. MiniMax developer platform
  5. Hetzner Cloud
  6. Astro
  7. Vercel
  8. PostHog
Filed under Case Studies · Behind the Scenes

Continue Reading