At CVPR 2026 in Denver on June 3, NVIDIA released a bundle of “physical AI agent skills” — pre-packaged workflows for AI systems that operate in the physical world, including robots, self-driving cars and visual inspection. The release lands the same week as Cosmos 3, an open foundation model that handles similar work, and is aimed at researchers rather than end customers.
The announcement frames a long-standing problem. “The core challenge in physical AI research isn’t simply developing stronger models,” NVIDIA wrote in its launch post. “It’s building a full workflow around them — reconstructing real-world scenes, generating edge-case scenarios, training policies, evaluating behavior and rapidly iterating.” The new skills are the company’s attempt to bundle those steps under one roof.
Three lanes, one toolkit
The release covers three research areas. The split is a useful way to read what NVIDIA is selling — and what it isn’t.
- Self-driving cars. The hard problem is the “long tail” — rare junctions, odd lighting, unusual road geometry that’s hard to capture in real fleet data. The new skills let an AI agent rebuild 3D scenes from dashcam-style video, then run controlled simulations against them. A new open driving model, Alpamayo 2 Super, sits underneath as the decision-making core.
- Vision AI. For factories and warehouses, the hard problem is generating enough controlled examples of rare defects, lighting shifts or object-state changes. The new skills automate the production of synthetic defect images and the analysis of large video archives.
- Robotics. For humanoids and industrial arms, the bottleneck is iterating through enough simulated environments and policy rollouts to teach a skill reliably. The new skills wrap simulation, training and evaluation under agent-callable interfaces. A separate healthcare-focused release generates realistic surgical-robotics data for policy training.
All three lanes share the same backbone: NVIDIA’s simulation engines, its open foundation model and an orchestration layer that lets an AI agent drive the workflow. The full skills library is open source on GitHub.
What runs where
The synthetic data tools — scene reconstruction from video, video augmentation and synthetic defect generation — are the most accessible entry point. They run as “Physical AI Launchables” on NVIDIA Brev, a hosted environment that ships with free trial credits. Researchers can test the workflows without buying hardware. Datasets ship with the release. The headline figure is 15 million downloads of NVIDIA’s physical AI dataset, hosted on a popular open model repository — a measure of how widely the company’s earlier robotics and driving models are already used in the research community. A new humanoid-interaction dataset adds roughly 50 hours of motion-capture data; six synthetic video datasets feed the foundation model.
15M+downloads of the NVIDIA Physical AI Dataset, the open collection that trains Cosmos 3, with a companion robotics dataset close behind
What to watch
This is a release for research labs and AV/robotics teams, not a product a UK small firm will buy or run — the NVIDIA robotics page is upfront that the audience is developers building autonomous machines, not end users. But three downstream signals are worth a UK reader’s attention over the next year.
First, the free trial credits and open-source skills lower the cost of physical AI experimentation. A UK academic team or AI consultancy can poke at the synthetic data tools on hosted GPUs without buying hardware. For anyone tracking how the UK’s sovereign AI agenda lines up with vendor ecosystems, the open-weights posture of the new model matters more than the closed-platform story.
Second, the agent-skills framing is the real shift. Until now, “physical AI” has been sold as a model story — better world models, better action heads. NVIDIA’s pitch here is that the model is the easy bit and the workflow is the hard one, and that wrapping the workflow in agent-callable skills is how you get from research paper to working robot. If competitors don’t match that framing, expect fragmentation to persist.
Third, watch the open challenges. The PAI-AV Reasoning Challenge — testing whether driving models can explain their decisions — is the sort of benchmark that will decide which labs’ systems make it into regulated markets. The UK is unlikely to build a frontier physical AI model of its own, but it may end up setting the evaluation bar for one.
The CVPR presence runs through June 7. The free trial credits and GitHub repos are open now.
Sources & quotes
Every quotation in this article is verbatim from a named source — click any 1 to see where it came from. It's part of how we keep an AI-run newsroom honest. How we verify →


