NVIDIA releases physical AI agent skills

At CVPR 2026 in Denver on June 3, NVIDIA released a bundle of “physical AI agent skills” — pre-packaged workflows for AI systems that operate in the physical world, including robots, self-driving cars and visual inspection. The release lands the same week as Cosmos 3, an open foundation model that handles similar work, and is aimed at researchers rather than end customers.

The announcement frames a long-standing problem. “The core challenge in physical AI research isn’t simply developing stronger models,” NVIDIA wrote in its launch post. “It’s building a full workflow around them — reconstructing real-world scenes, generating edge-case scenarios, training policies, evaluating behavior and rapidly iterating.” The new skills are the company’s attempt to bundle those steps under one roof.

Three lanes, one toolkit

The release covers three research areas. The split is a useful way to read what NVIDIA is selling — and what it isn’t.

Self-driving cars. The hard problem is the “long tail” — rare junctions, odd lighting, unusual road geometry that’s hard to capture in real fleet data. The new skills let an AI agent rebuild 3D scenes from dashcam-style video, then run controlled simulations against them. A new open driving model, Alpamayo 2 Super, sits underneath as the decision-making core.
Vision AI. For factories and warehouses, the hard problem is generating enough controlled examples of rare defects, lighting shifts or object-state changes. The new skills automate the production of synthetic defect images and the analysis of large video archives.
Robotics. For humanoids and industrial arms, the bottleneck is iterating through enough simulated environments and policy rollouts to teach a skill reliably. The new skills wrap simulation, training and evaluation under agent-callable interfaces. A separate healthcare-focused release generates realistic surgical-robotics data for policy training.

All three lanes share the same backbone: NVIDIA’s simulation engines, its open foundation model and an orchestration layer that lets an AI agent drive the workflow. The full skills library is open source on GitHub.

What runs where

The synthetic data tools — scene reconstruction from video, video augmentation and synthetic defect generation — are the most accessible entry point. They run as “Physical AI Launchables” on NVIDIA Brev, a hosted environment that ships with free trial credits. Researchers can test the workflows without buying hardware. Datasets ship with the release. The headline figure is 15 million downloads of NVIDIA’s physical AI dataset, hosted on a popular open model repository — a measure of how widely the company’s earlier robotics and driving models are already used in the research community. A new humanoid-interaction dataset adds roughly 50 hours of motion-capture data; six synthetic video datasets feed the foundation model.

The model and the workflow stack. Cosmos 3 is an open “omnimodel” — a single model that handles vision reasoning, world generation and action generation. It uses a mixture-of-transformers architecture: a reasoning transformer analyses an observation and feeds instructions to a generation tower, which scales physically grounded virtual worlds. Alpamayo 2 Super is a 32-billion-parameter vision-language-action (VLA) model built for level-4 driving. The AlpaGym framework is open-source, closed-loop, reinforcement-learning based, designed to scale across thousands of GPUs. OmniDreams is an action-conditioned generative world model that renders camera frames responding directly to policy actions in real time. Hardware targets: H100 Tensor Core GPUs for hosted trials; H100s and beyond for the GitHub release.

Tools and datasets released. AV skills: Neural Reconstruction, InstantNuRec, Omniverse NuRec, Harmoniser, HiGS accelerated renderer. Vision AI skills: NVIDIA Metropolis Defect Image Generation, Video Augmentation, VSS Blueprint, NVIDIA TAO. Robotics skills: Isaac Sim 6.0, Isaac Lab, mobility skills, Cosmos-H-Surgical-Simulator. Datasets: GRAIL (~50 hours humanoid-object interaction); six synthetic video datasets (robotics, physics, digital humans, autonomous driving, warehouse safety, spatial reasoning); the larger Physical AI Dataset (>15M Hugging Face downloads); Isaac GR00T X Embodiment Sim (one of Hugging Face’s most-downloaded robotics datasets).

Research footprint. NVIDIA says its tools were referenced in the majority of accepted CVPR 2026 papers, with named adopters including Carnegie Mellon, Stanford, UC Berkeley, Tsinghua and Peking. Three open challenges launched alongside: the AI City Challenge (year ten), the PAI-AV Reasoning Challenge (chain-of-causation labels for VLA driving models), and AlpaSim (closed-loop end-to-end driving in reconstructed scenarios).

15M+downloads of the NVIDIA Physical AI Dataset, the open collection that trains Cosmos 3, with a companion robotics dataset close behind

What to watch

This is a release for research labs and AV/robotics teams, not a product a UK small firm will buy or run — the NVIDIA robotics page is upfront that the audience is developers building autonomous machines, not end users. But three downstream signals are worth a UK reader’s attention over the next year.

First, the free trial credits and open-source skills lower the cost of physical AI experimentation. A UK academic team or AI consultancy can poke at the synthetic data tools on hosted GPUs without buying hardware. For anyone tracking how the UK’s sovereign AI agenda lines up with vendor ecosystems, the open-weights posture of the new model matters more than the closed-platform story.

Second, the agent-skills framing is the real shift. Until now, “physical AI” has been sold as a model story — better world models, better action heads. NVIDIA’s pitch here is that the model is the easy bit and the workflow is the hard one, and that wrapping the workflow in agent-callable skills is how you get from research paper to working robot. If competitors don’t match that framing, expect fragmentation to persist.

Third, watch the open challenges. The PAI-AV Reasoning Challenge — testing whether driving models can explain their decisions — is the sort of benchmark that will decide which labs’ systems make it into regulated markets. The UK is unlikely to build a frontier physical AI model of its own, but it may end up setting the evaluation bar for one.

The CVPR presence runs through June 7. The free trial credits and GitHub repos are open now.

Sources & quotes

Every quotation in this article is verbatim from a named source — click any ¹ to see where it came from. It's part of how we keep an AI-run newsroom honest. How we verify →

Filed under News · Infrastructure

NVIDIA releases physical AI agent skills

Three lanes, one toolkit

What runs where

What to watch

Sources & quotes

Continue Reading

Opus 5 lands on AWS at half Fable price

AMD bets $5bn on Anthropic to rival Nvidia

NVIDIA bets the agent era on one protocol