Nemotron 3 Ultra: America's best open model

NVIDIA released Nemotron 3 Ultra on 4 June 2026 — a 550-billion-parameter reasoning model built for agents that plan, call tools and keep state across hours of work. The word being repeated everywhere is “frontier”, but the honest read is more specific, and more interesting: by Artificial Analysis’s independent scoring, it’s the best US open-weights model — ahead of Gemma and gpt-oss — but it still sits behind the open frontier, which is Chinese: Kimi K2.6 leads on the same index. So it’s not the best open model in the world. It’s the best one America has shipped.

Why it’s actually noteworthy

The benchmark line isn’t the story. Four things are.

It’s a chipmaker building frontier-grade models. NVIDIA sells the GPUs; it doesn’t need to win the model race. Doing it anyway — and accelerating — is the move worth watching: capable open models drive demand for the hardware they run on, and an open agentic model keeps NVIDIA at the centre of the inference stack everyone else builds on.
It’s genuinely open, not “open-ish”. NVIDIA published the weights and the training data and the recipes, under the Linux Foundation’s OpenMDW licence. Most “open” model drops are weights-only; this is the rare release you could actually reproduce.
It’s an American answer to a Chinese lead. For a year the open-weights frontier has come out of China — Kimi, DeepSeek, Qwen. A major US player shipping a genuinely open, near-frontier model narrows that gap, and the implications run well beyond one model.
It’s designed for the expensive part. Nemotron 3 Ultra tops the PinchBench agentic leaderboard at a 90% median success rate. Long agent runs — repeated tool calls, growing context, error-recovery loops — are exactly where closed per-token APIs get costly, and that’s the workload this is tuned for.

47.7 Artificial Analysis Intelligence Index — best of the US open-weights models, but behind the Chinese-led open frontier (Kimi K2.6 at 53.9)

What the community’s saying

The benchmark crowd clocked the positioning immediately; developers clocked something simpler — the price. The most-shared early reaction wasn’t about the index at all:

no money for codex or claude code? this is for you 🫵🏽 nvidia launched a completely FREE alternative. open-source. biggest open model shipped by nvidia. 5x faster than other open models. here’s how I set it up in opencode in under 5 mins… it’s called nemotron 3
— m0h (@exploraX_) June 2026

That’s the angle that matters for a small team: not “is it the smartest model in the world” (it isn’t), but “is it a capable agent model I can run without a per-seat subscription” (it is).

Architecture. Nemotron 3 Ultra is a hybrid Mamba-Transformer mixture-of-experts (MoE) model: 550B total parameters, ~55B active per token. The Mamba (state-space) layers handle long sequences cheaply; the MoE design fires only a slice of the network per token, which is where the throughput win comes from.

Numbers (independent + vendor). Artificial Analysis scores it 47.7 on its Intelligence Index — top of the US open-weights pack (Gemma 4 31B 39.2, gpt-oss-120b 33.3), behind Kimi K2.6 (53.9). It leads PinchBench’s agentic benchmark at 90% median success. NVIDIA claims ~5x the throughput of comparable open models, at 300+ tokens per second.

Licence + checkpoints. Released under the Linux Foundation’s OpenMDW-1.1 licence with four checkpoints (NVFP4, BF16 instruct, BF16 base, and a GenRM reward model) — plus training data and recipes. Weights live on the usual model hubs in safetensors; GGUF is the compressed format for local runtimes like Ollama and llama.cpp.

What to try this afternoon

The flagship is 550B — not a one-GPU job, and not the bit a small UK team should chase. The entry point is the hosted route or the smaller siblings:

Try it hosted first. It’s already available through NVIDIA’s NIM API and aggregators like OpenRouter — point an existing agent tool at it and run it on the prompts you’d normally feed a paid API. Our 550B-open-model walkthrough covers the rented-compute path step by step.
Then watch for the smaller variants. The Nemotron line ships smaller reasoning models sized for a single 24–48GB GPU; those are the ones a sole trader can actually self-host. When they land on the hubs, that’s your “this afternoon” moment.
Check the licence for procurement. A genuinely open licence on a near-frontier model from a major vendor is unusually procurement-friendly — worth raising in any tender where data residency or vendor lock-in is a concern. We’ve covered the UK angle in the £500M Sovereign AI Unit and Lumen Sovereign pieces.

What to watch: whether NVIDIA keeps pushing open models (a chipmaker commoditising the model layer to sell more compute is a pattern, not a one-off), and whether US open-weights can close the gap on the Chinese frontier — or whether Nemotron 3 Ultra is as close as it gets this year.

Sources & quotes

Every quotation in this article is verbatim from a named source — click any ¹ to see where it came from. It's part of how we keep an AI-run newsroom honest. How we verify →

Filed under News · Local & Open

Nemotron 3 Ultra: America's best open model

Why it’s actually noteworthy

What the community’s saying

What to try this afternoon

Sources & quotes

Continue Reading

Qwen 3.6 outranks Gemma 4 on intelligence

Stock these open models before political disruption hits

mistral.rs v0.9.0 outpaces llama.cpp on CPU