Capability

Local Inference

11 pieces on local inference — practical workflows, case studies and field notes.

Running Local Inference on the New Gemma Models — From Departmental Hardware

How small teams are deploying quantised Gemma models on commodity GPUs to run private, offline pipelines. No cloud, no data leaving the building.

9 Jun 2026/8 min read

Analysis · Local AI

OpenJarvis v1.0: The Local-First Agent Framework Ollama Has Been Waiting For

Stanford's Hazy Research has shipped the first credible open-source framework for personal AI agents that run on your own hardware. For UK operators, local-first has stopped being a manifesto and started being a curl command.

9 Jun 2026/5 min read

Opinion · Pricing

The $19 Agentic Stack: More Tokens for Your Money Than a $20 Seat

Anthropic just dropped Claude Fable 5 into the $20 tier and MiniMax M3 matches it on agentic work. For a small team, the value question has quietly flipped.

9 Jun 2026/6 min read

Sovereign AI · Frontier

Lumen Sovereign: Britain's first home-grown frontier model takes shape

The UK is preparing its first fully sovereign frontier AI model, with startup Cosine leading and a roster of major British firms on design. Here's why data residency and procurement confidence are the real story.

6 Jun 2026/6 min read

Local Models · Open Weights

MiniMax M3: an Open-Weight Frontier Model Lands — and Small Teams Can Run the Workflow

A frontier-grade model with open weights, a million-token context window and native multimodality. For small teams, it reframes what is possible without a per-seat cloud contract — if you can find the hardware.

5 Jun 2026/7 min read

Local Models · Multimodal

Gemma 4 Brings Vision and Tool Calling — Agents That See, on Your Own Box

Gemma 4 adds built-in tool calling and vision support, and Ollama now runs it fully. For a retail team, that means document, shelf and stock workflows that never send an image to the cloud.

4 Jun 2026/7 min read

Local Models · Benchmarks

Qwen 3.6 Might Be the New Local Default for a 24GB GPU

A 27B model that reportedly tops consumer-hardware leaderboards and fits in a single 24GB card at Q4. For a sole trader or a small professional-services team, that is the sweet spot worth understanding.

2 Jun 2026/7 min read

Tooling · Comparison

LM Studio vs Ollama in 2026: which local runtime for a small team?

Both run open models on your own hardware. The right pick has less to do with benchmarks than with who on your team will actually be using it.

1 Jun 2026/6 min read

Tooling · Hardware

Running local AI on AMD in 2026: ROCm finally earns a seat

AMD's software stack spent years as the awkward alternative to NVIDIA. In 2026 it is a credible cost play for a back-office team — provided you check a few things first.

30 May 2026/6 min read

Local Models · Long Context

Llama 4 Scout Puts a 10M-Token Context Window in the Open

Meta's Llama 4 Scout brings a ten-million-token context window into the open. For logistics and data-heavy teams, the real question is what a window that big is — and isn't — actually good for.

28 May 2026/7 min read

Tooling · Local Runtimes

Ollama v0.24 adds the Codex app and gets faster on Apple Silicon

May 2026's runtime updates look like housekeeping. For a solo operator running models on a MacBook, they quietly remove some of the friction that makes local AI feel like hard work.

25 May 2026/5 min read