Reference
Glossary
The terms this site leans on, in plain English. Anywhere you see a dotted underline in an article, it's one of these — click it for the quick version without leaving the page.
| Term | The quick version | Learn more |
|---|---|---|
| Agent | AI that does multi-step work — reading, deciding, using other software, acting — rather than just answering one question. Think “junior colleague with tools”, not “chatbot”. | What are AI agents? How we built an agent-run news site |
| Agentic AI | The adjective for AI systems that act on their own toward a goal — planning steps, calling tools, checking results — with a human setting direction and limits. | The $19 agentic stack Microsoft Agent Framework goes GA |
| API | A way for your software to talk to a model directly, paying per use, rather than through a chat window. How most AI features inside other products are actually built. | |
| Benchmark | A standard test used to compare models on a skill — coding, reasoning, agentic tasks. Useful for ranking, but a high score is not the same as being good at your job. | Nemotron 3 Ultra: America's best open model |
| Connector | A packaged, click-to-enable integration between an AI assistant and another tool (Slack, Google Drive, Microsoft 365). The friendly storefront version of MCP. | Buying AI for a five-person team |
| Context window | How much a model can hold in mind at once — the prompt, documents, and conversation so far, measured in tokens. A bigger window means whole contracts in one pass. | Llama 4 Scout and long context |
| Diffusion model | A model that generates by starting from noise and refining — long the technique behind AI images, now used for text too, where it can be far faster than the usual word-by-word approach. | DiffusionGemma: 4x faster open text model |
| Distillation | Training a small, cheap model to copy a big, expensive one — keeping most of the quality at a fraction of the running cost. How a frontier model becomes something you can self-host. | |
| Export controls | Government rules restricting who can access a technology across borders, on national-security grounds. Increasingly applied to advanced AI models and the chips that run them. | An AI export ban that backfires on defenders |
| Fine-tuning | Training an existing model further on your own examples so it learns your tone, templates and domain terms permanently — different from prompting, which only instructs it for one conversation. | Train your own Gemma 4 with Unsloth |
| Foundation model | A large general model trained on broad data that others build on top of — the base layer (GPT, Claude, Gemini, Llama) beneath most AI products. | |
| Frontier model | The most capable models in the world at any given moment — the cutting edge from the best-resourced labs. Powerful, but the dearest to run and the most tightly controlled. | |
| GGUF | The standard file format for local AI models — what you download to run a model in Ollama, LM Studio or llama.cpp. | LM Studio vs Ollama |
| Grounding | Tying an AI’s claims to source material it actually read — the working defence against hallucination. This site’s own pipeline rejects any quote that isn’t verbatim in a source. | How we built an agent-run news site |
| Guardrails | The safety layer around a model that blocks high-risk requests — often a separate classifier checking inputs and outputs. Too loose and it leaks; too tight and it refuses legitimate work. | An AI export ban that backfires on defenders |
| Hallucination | When an AI states something false with full confidence — invented figures, citations or product details. The single biggest reason AI output needs checking before it ships. | How we built an agent-run news site |
| Human in the loop | A workflow where AI does the work but a person approves it before it counts — the publish button stays human. How this site runs its own newsroom. | How we built an agent-run news site |
| Inference | The act of a trained model producing output — every chat reply is inference. Distinct from training, which is how the model learnt in the first place. | Gemma 4 on your own hardware |
| Jailbreak | A prompt that tricks a model past its safety rules into doing something it's meant to refuse. Labs test for them constantly; a serious one can force a model offline. | An AI export ban that backfires on defenders |
| LLM (Large Language Model) | The engine behind modern AI assistants: a model trained on huge amounts of text that predicts and generates language. ChatGPT, Claude and Gemini are products built on LLMs. | Gemma 4 on your own hardware |
| Local inference | Running an AI model on your own computer or server instead of a vendor’s cloud — nothing leaves the building, and there’s no per-use bill once the hardware is on the desk. | Gemma 4 on your own hardware LM Studio vs Ollama |
| LoRA | The cheap fine-tuning technique: instead of retraining a whole model, it adjusts a thin extra layer (1–2% of the weights), so a real fine-tune fits on a single graphics card. | Train your own Gemma 4 with Unsloth |
| MCP (Model Context Protocol) | The open standard that lets an AI assistant read live data from — and act on — the tools you already use (drives, CRMs, spreadsheets). Originally from Anthropic; now supported across most major plans. | MCP hits 97M downloads and goes stateless Buying AI for a five-person team |
| Mixture of Experts (MoE) | A model design that fires only a slice of its network per word instead of the whole thing — so a huge model runs at the speed and cost of a much smaller one. | Nemotron 3 Ultra: America's best open model |
| Model | The trained AI itself — the file (or cloud service) that turns input into output. Models come in families and sizes; bigger is usually abler but dearer to run. | Qwen 3.6: the new local default |
| Multimodal | A model that handles more than text — images, audio, sometimes video — in the same conversation. The difference between "describe this screenshot" working or not. | |
| Ollama | The most popular free tool for running AI models on your own machine — one install, one command, and a local model is answering. | Ollama v0.24: Codex and Apple Silicon LM Studio vs Ollama |
| Open source (AI) | Software anyone can inspect, run and modify under its licence. For models the stricter bar is open weights plus the training data and recipe — full open source is rarer than the label suggests. | Google releases an open standard for AI knowledge |
| Open weights | A model you can download and run yourself, free of per-use fees. “Open weights” is not always full open source — the licence says what you may do with it. | DiffusionGemma: 4x faster open text model MiniMax-M3: the open-weight frontier |
| Parameters | A model's adjustable values, learnt during training — the rough measure of its size (a 550B model has 550 billion). More usually means abler but heavier to run. | |
| Prompt | The instruction you give a model. Clear, specific prompts are most of the skill in getting good output — and the same prompt can behave differently across models. | |
| Quantisation | Compressing a model so it fits smaller, cheaper hardware, at a small cost in quality — the difference between needing a server and needing a gaming PC. | Gemma 4 on your own hardware |
| RAG (Retrieval-Augmented Generation) | Giving a model the right documents to read at question time so its answer is grounded in your facts, not just its training. The standard way to make an assistant answer from your own files. | |
| Reasoning model | A model that works through a problem step by step before answering, trading speed for accuracy on hard tasks like maths, code and multi-step planning. It "thinks" in a hidden scratchpad first. | Nemotron 3 Ultra: America's best open model |
| Sovereign AI | A country's push to own its AI stack — models, compute and data on home soil — rather than renting it from foreign firms. A growing theme in UK policy and procurement. | |
| System prompt | The hidden standing instruction that sets a model's role and rules for a whole conversation, before the user types anything. Where an assistant's persona and limits live. | |
| Throughput | How fast a model produces output, usually in tokens per second. Higher throughput means snappier replies and cheaper long jobs — and is where a lot of recent engineering effort has gone. | |
| Token | The unit AI reads and writes in — roughly three-quarters of a word. Pricing, usage limits and speed (“tokens per second”) are all measured in tokens. | The $20 standard in AI pricing |
| Tool use (function calling) | When a model can call real software — search the web, run code, hit an API — instead of only writing text. The mechanism that turns a chatbot into an agent. | What are AI agents? |
| VRAM | The memory on a graphics card — the budget that decides which AI models your hardware can run. More VRAM, bigger models. | DiffusionGemma: 4x faster open text model |