Reference

Glossary

The terms this site leans on, in plain English. Anywhere you see a dotted underline in an article, it's one of these — click it for the quick version without leaving the page.

Term The quick version Learn more
Agent AI that does multi-step work — reading, deciding, using other software, acting — rather than just answering one question. Think “junior colleague with tools”, not “chatbot”. What are AI agents?
How we built an agent-run news site
Agentic AI The adjective for AI systems that act on their own toward a goal — planning steps, calling tools, checking results — with a human setting direction and limits. The $19 agentic stack
Microsoft Agent Framework goes GA
API A way for your software to talk to a model directly, paying per use, rather than through a chat window. How most AI features inside other products are actually built.
Benchmark A standard test used to compare models on a skill — coding, reasoning, agentic tasks. Useful for ranking, but a high score is not the same as being good at your job. Nemotron 3 Ultra: America's best open model
Connector A packaged, click-to-enable integration between an AI assistant and another tool (Slack, Google Drive, Microsoft 365). The friendly storefront version of MCP. Buying AI for a five-person team
Context window How much a model can hold in mind at once — the prompt, documents, and conversation so far, measured in tokens. A bigger window means whole contracts in one pass. Llama 4 Scout and long context
Diffusion model A model that generates by starting from noise and refining — long the technique behind AI images, now used for text too, where it can be far faster than the usual word-by-word approach. DiffusionGemma: 4x faster open text model
Distillation Training a small, cheap model to copy a big, expensive one — keeping most of the quality at a fraction of the running cost. How a frontier model becomes something you can self-host.
Export controls Government rules restricting who can access a technology across borders, on national-security grounds. Increasingly applied to advanced AI models and the chips that run them. An AI export ban that backfires on defenders
Fine-tuning Training an existing model further on your own examples so it learns your tone, templates and domain terms permanently — different from prompting, which only instructs it for one conversation. Train your own Gemma 4 with Unsloth
Foundation model A large general model trained on broad data that others build on top of — the base layer (GPT, Claude, Gemini, Llama) beneath most AI products.
Frontier model The most capable models in the world at any given moment — the cutting edge from the best-resourced labs. Powerful, but the dearest to run and the most tightly controlled.
GGUF The standard file format for local AI models — what you download to run a model in Ollama, LM Studio or llama.cpp. LM Studio vs Ollama
Grounding Tying an AI’s claims to source material it actually read — the working defence against hallucination. This site’s own pipeline rejects any quote that isn’t verbatim in a source. How we built an agent-run news site
Guardrails The safety layer around a model that blocks high-risk requests — often a separate classifier checking inputs and outputs. Too loose and it leaks; too tight and it refuses legitimate work. An AI export ban that backfires on defenders
Hallucination When an AI states something false with full confidence — invented figures, citations or product details. The single biggest reason AI output needs checking before it ships. How we built an agent-run news site
Human in the loop A workflow where AI does the work but a person approves it before it counts — the publish button stays human. How this site runs its own newsroom. How we built an agent-run news site
Inference The act of a trained model producing output — every chat reply is inference. Distinct from training, which is how the model learnt in the first place. Gemma 4 on your own hardware
Jailbreak A prompt that tricks a model past its safety rules into doing something it's meant to refuse. Labs test for them constantly; a serious one can force a model offline. An AI export ban that backfires on defenders
LLM (Large Language Model) The engine behind modern AI assistants: a model trained on huge amounts of text that predicts and generates language. ChatGPT, Claude and Gemini are products built on LLMs. Gemma 4 on your own hardware
Local inference Running an AI model on your own computer or server instead of a vendor’s cloud — nothing leaves the building, and there’s no per-use bill once the hardware is on the desk. Gemma 4 on your own hardware
LM Studio vs Ollama
LoRA The cheap fine-tuning technique: instead of retraining a whole model, it adjusts a thin extra layer (1–2% of the weights), so a real fine-tune fits on a single graphics card. Train your own Gemma 4 with Unsloth
MCP (Model Context Protocol) The open standard that lets an AI assistant read live data from — and act on — the tools you already use (drives, CRMs, spreadsheets). Originally from Anthropic; now supported across most major plans. MCP hits 97M downloads and goes stateless
Buying AI for a five-person team
Mixture of Experts (MoE) A model design that fires only a slice of its network per word instead of the whole thing — so a huge model runs at the speed and cost of a much smaller one. Nemotron 3 Ultra: America's best open model
Model The trained AI itself — the file (or cloud service) that turns input into output. Models come in families and sizes; bigger is usually abler but dearer to run. Qwen 3.6: the new local default
Multimodal A model that handles more than text — images, audio, sometimes video — in the same conversation. The difference between "describe this screenshot" working or not.
Ollama The most popular free tool for running AI models on your own machine — one install, one command, and a local model is answering. Ollama v0.24: Codex and Apple Silicon
LM Studio vs Ollama
Open source (AI) Software anyone can inspect, run and modify under its licence. For models the stricter bar is open weights plus the training data and recipe — full open source is rarer than the label suggests. Google releases an open standard for AI knowledge
Open weights A model you can download and run yourself, free of per-use fees. “Open weights” is not always full open source — the licence says what you may do with it. DiffusionGemma: 4x faster open text model
MiniMax-M3: the open-weight frontier
Parameters A model's adjustable values, learnt during training — the rough measure of its size (a 550B model has 550 billion). More usually means abler but heavier to run.
Prompt The instruction you give a model. Clear, specific prompts are most of the skill in getting good output — and the same prompt can behave differently across models.
Quantisation Compressing a model so it fits smaller, cheaper hardware, at a small cost in quality — the difference between needing a server and needing a gaming PC. Gemma 4 on your own hardware
RAG (Retrieval-Augmented Generation) Giving a model the right documents to read at question time so its answer is grounded in your facts, not just its training. The standard way to make an assistant answer from your own files.
Reasoning model A model that works through a problem step by step before answering, trading speed for accuracy on hard tasks like maths, code and multi-step planning. It "thinks" in a hidden scratchpad first. Nemotron 3 Ultra: America's best open model
Sovereign AI A country's push to own its AI stack — models, compute and data on home soil — rather than renting it from foreign firms. A growing theme in UK policy and procurement.
System prompt The hidden standing instruction that sets a model's role and rules for a whole conversation, before the user types anything. Where an assistant's persona and limits live.
Throughput How fast a model produces output, usually in tokens per second. Higher throughput means snappier replies and cheaper long jobs — and is where a lot of recent engineering effort has gone.
Token The unit AI reads and writes in — roughly three-quarters of a word. Pricing, usage limits and speed (“tokens per second”) are all measured in tokens. The $20 standard in AI pricing
Tool use (function calling) When a model can call real software — search the web, run code, hit an API — instead of only writing text. The mechanism that turns a chatbot into an agent. What are AI agents?
VRAM The memory on a graphics card — the budget that decides which AI models your hardware can run. More VRAM, bigger models. DiffusionGemma: 4x faster open text model