Inside Nvidia’s Vera Rubin, Vera CPU, and the new age of AI agents

15 Jun 2026 05:07 108,390 views

Nvidia used GTC Taipei 2026 to declare that “useful AI” and agentic AI have arrived—and to unveil a sweeping new stack to power it. From the Vera Rubin AI factory platform and Vera CPU to Neotron 3 Ultra, Cosmos 3, and RTX Spark PCs, here’s what it all means and why it matters.

Nvidia’s latest GTC in Taipei wasn’t just another chip reveal. It was a full blueprint for the next decade of AI: agents that can actually do work, AI factories that mint profitable tokens, CPUs designed for machines instead of humans, and a complete rethink of the personal computer.

From generative AI to agentic AI

For years, AI has mostly meant generating text, images, or code on demand. At GTC, Nvidia argued that we’ve now crossed into a new phase: agentic AI. Instead of just answering prompts, AI systems can observe, reason, plan, use tools, and act—like a digital worker running inside software.

Jensen Huang illustrated this shift with GitHub data. The number of code commits has nearly tripled in early 2026 compared to previous years, even though the number of developers hasn’t tripled. His point: AI coding assistants are massively boosting productivity. Thirty million developers, paid around $3 trillion in salaries, are now producing output closer to $9 trillion in value. Rather than replacing programmers, AI is making each one far more valuable—and driving demand for more of them.

This surge in productivity has a direct consequence: tokens (the fundamental units of AI computation) have become profitable units of revenue. If each token can generate value, then everyone wants more tokens. That’s why compute demand—and especially demand for Nvidia hardware—has exploded.

What is an AI agent, really?

Nvidia defines an AI agent as a new kind of application. Instead of a traditional app made of fixed code running on an operating system, an agent is a loop built around a large language model plus a “harness” that orchestrates everything it does.

At a high level, an agent:

• Takes input (text, voice, images, data, sensor feeds).
• Observes and understands the context.
• Reasons about what to do next.
• Plans a sequence of steps.
• Uses tools—like spreadsheets, browsers, compilers, databases, or CUDA-accelerated libraries.
• Writes to and reads from short-term working memory and long-term memory.

The harness is the glue: it routes information, calls tools, manages memory, enforces security, and coordinates sub-agents. The large language model is the “brain,” but the harness, tools, and runtime together form the “body” and “workshop” where real tasks get done.

This pattern—model + harness + tools + memory + runtime—is what Nvidia believes will underpin almost every future AI system, from enterprise assistants to robots and self-driving cars.

CUDA X libraries as tools for agents

One concern in the agent era is whether traditional software will be displaced. Nvidia’s answer is the opposite: agents will use more software than ever, but they need that software to be exposed as tools they can understand.

Nvidia’s “treasure” here is its CUDA X libraries: highly optimized libraries for everything from computational lithography (CU Litho) and decision optimization (CU Opt) to genomics (Parabricks), AI RAN (Aerial), and differentiable physics (Warp). Historically, these were tools for human developers. Now, Nvidia is packaging them with “skills”—machine-readable manuals that agents can learn from and call automatically.

In practice, this means an AI agent could, for example, use CU Opt to solve a complex logistics problem or Warp to simulate physics, without a human writing all the glue code. The agent reads the skill description, understands how to call the library, and uses it as part of a larger plan.

Vera Rubin: an AI factory for agents

To run these agents at scale, Nvidia unveiled Vera Rubin: not a single chip, but a full multi-rack, pod-scale AI system designed specifically for agentic workloads.

Vera Rubin is built around several tightly integrated components:

• Vera Rubin NVL72 GPU racks for the heavy thinking—prompt processing, context handling, reasoning, and planning with large language models.
• Vera CPU racks to orchestrate agents, manage memory, launch tools, and coordinate workloads.
• Vera BlueField storage and security racks that accelerate storage, manage context memory, and provide in-silicon security with DPUs.
• GROK LPX low-latency racks for ultra-fast inference when latency is critical.
• Spectrum-X Ethernet with co-packaged optics to connect everything at massive bandwidth.

All of this is built using extreme co-design: chips, boards, racks, power, cooling, and software are designed together as one system. The NVL72 rack, for example, uses a PCB midplane instead of cables, dramatically improving reliability and cutting assembly time from two hours to five minutes.

Nvidia emphasized that Vera Rubin is in full production, with a supply chain in Taiwan that’s twice as large as the one built for the previous Grace Blackwell generation. Major partners like Microsoft, Dell, and CoreWeave already have engineering racks up and running.

DSX: blueprinting the AI factory

Because AI factories are now multi-billion-dollar infrastructure projects, Nvidia is moving beyond systems into full-stack infrastructure design. Each 1-gigawatt AI factory can cost $50–$100 billion, so it has to work perfectly from day one.

Nvidia’s DSX platform is the blueprint and operating system for these AI factories:

• DSX SIM uses Omniverse to create a digital twin of the entire factory—racks, power, cooling, network, and grid integration—so operators can simulate and validate everything before building.
• DSX OSS provisions, operates, monitors, and remediates the infrastructure, turning hardware into reliable AI capacity.
• DSX Max LPS optimizes power and cooling, allowing more GPUs to run within the same power budget, smoothing peaks, and dynamically steering power to where work is happening.
• DSX Flex lets AI factories cooperate with the power grid, adjusting consumption based on real-time grid conditions.

The goal is simple: maximize tokens per watt and time-to-first-token. In Nvidia’s framing, compute is now revenue, and performance per watt directly translates into profitability.

Vera CPU: a processor built for agents, not humans

Traditional CPUs were designed for human-centric workloads and cloud rental economics—lots of cores, sliced up and rented by the hour. Agentic AI flips that model. Agents are impatient, operate at nanosecond time scales, and sit next to extremely expensive GPUs. Any CPU bottleneck directly hurts token throughput and user experience.

Enter Nvidia Vera, a new data center CPU architecture built specifically for agentic workloads. It’s used in three main roles inside the AI factory:

• As the control plane inside Vera Rubin GPU racks, orchestrating GPUs and managing KV cache and rack-level software.
• As standalone Vera CPU racks, running harnesses, tools, sandboxes, and data pipelines for agents.
• As the brains of Vera BlueField storage systems, feeding GPUs with context and long-term memory at high speed.

Vera is built around four key design goals:

• High single-thread performance: The custom Olympus core delivers very high instructions per clock (IPC) with a 10-wide decode engine, advanced branch prediction, a large out-of-order engine, and smart prefetchers tuned for branch-heavy runtimes like Python and tool calls.
• Massive bandwidth per core: Vera is the first CPU to use LPDDR5X memory with ECC that corrects multiple errors without sacrificing bandwidth, delivering around 1.2 TB/s of memory bandwidth—2–3x more than typical high-end CPUs.
• Huge on-chip and off-chip bandwidth: 88 cores are connected via a monolithic mesh fabric (no chiplet boundaries) with up to 3.6 TB/s of cross-sectional bandwidth and PCIe Gen 6 plus coherent NVLink to GPUs and other CPUs.
• Energy efficiency: Since AI factories are power-constrained, Vera is designed to pack a lot of CPU capability into limited power budgets, leaving as much power as possible for token-generating GPUs.

On real workloads, Nvidia claims dramatic gains: up to 3x faster SQL processing and around 6x faster real-time stream processing for use cases like stock exchange telemetry. For agent sandboxes and tool orchestration, Vera is positioned as a major new growth driver alongside Nvidia’s GPUs.

Neotron 3 Ultra: open models for building your own agents

Hardware alone doesn’t make agents useful. Enterprises also need strong base models they can adapt to their own domains. Nvidia’s answer is the Neotron family of open models.

At GTC, Nvidia announced Neotron 3 Ultra, its latest open large language model designed for reasoning and tool use. Key points:

• It uses a hybrid architecture combining state-space models (SSMs) with mixture-of-experts (MoE), tuned for high speed and low cost.
• It’s about 5x faster and around 30% cheaper to run (in FLOPs and inference time) than leading open models Nvidia compared against.
• Nvidia releases not just the model weights, but also the training data and training scripts, enabling organizations to extend and specialize the model for proprietary use.

Neotron models are central to Nvidia’s agent toolkit. Partners like Cadence, CrowdStrike, Dassault Systèmes, Palantir, SAP, and ServiceNow can start from Neotron, add their own data and workflows, and ship domain-specific “super agents.”

If you’re following the broader AI model race, this sits alongside other major model families like OpenAI’s GPT, Google’s Gemini, Anthropic’s Claude, and Elon Musk’s Grok. For a deeper look at that competitive landscape, see our breakdown of Grok 5, Cursor, and the new phase of the AI coding race.

Nvidia’s agent toolkit and OpenShell runtime

To make agentic AI practical for enterprises, Nvidia is packaging a full toolkit:

• Models: Open Neotron models and support for third-party models like Claude and Codeium/Cloud Code.
• Harnesses: Frameworks like Hermes and others that orchestrate multi-step tasks, sub-agents, and tool calls.
• Tools and skills: CUDA X libraries and partner tools (e.g., Cadence simulators) exposed with machine-readable skills so agents can call them safely and effectively.
• Runtime: Nvidia OpenShell, an open-source runtime that sandboxes agents, enforces security and privacy policies, manages identities and permissions, and runs across clouds, on-premises, and even on devices.

One flagship example is Nvidia’s partnership with Cadence to build chip-design agents. These agents can read architectural specs and RTL, generate tests, run Cadence simulators, perform formal verification, and debug issues—cutting verification cycles from weeks to hours. It’s a glimpse of how agentic AI can reshape high-end engineering workflows.

Cosmos 3: a world model for physical AI

Language models are trained on text written by humans for humans. But physical AI—robots, autonomous vehicles, industrial systems—needs a different kind of understanding: how the world looks, moves, and behaves from a robot’s point of view.

Real-world data for this is hard to collect and scale. Nvidia’s answer is Cosmos 3, an open frontier “omni-model” for physical AI. Cosmos is designed to be:

• A vision-language model (VLM) that can watch the physical world, describe scenes, and flag important events.
• A world model that can generate physics-accurate synthetic video from text, images, or video, effectively simulating what happens next.
• A simulator that closes the loop for training and evaluating control policies in virtual environments.
• A world action model that can perceive, reason, plan, and generate actions for robots.

Cosmos underpins Nvidia’s Omnidreams and the broader Isaac robotics stack. Like Neotron, Cosmos is released with model weights, data, and training recipes so developers can adapt it to their own robots, factories, or environments.

AlpaMo and Hyperion: reasoning for self-driving cars

On the automotive side, Nvidia introduced AlpaMo 2, an open model for autonomous driving. Unlike traditional perception-planning stacks that are mostly silent, AlpaMo is a “reasoning car” that can narrate its decisions: when it nudges left to avoid a parked van, yields to pedestrians, or keeps distance from a merging truck, it can explain why.

AlpaMo runs on Nvidia’s Hyperion platform, a full-stack AV system that includes sensors, compute, and software. Nvidia says automakers representing roughly 80% of global car production are building on Hyperion, and about 97% of global mobility services are integrated with its ecosystem.

This means AlpaMo and similar models can be deployed widely as part of a standardized runtime, while still allowing carmakers to customize behavior and branding.

Isaac Groot and reference humanoid robots

Humanoid robots are another frontier for agentic AI. Building one from scratch is extremely hard: you need hardware, simulators, teleoperation tools, data pipelines, training infrastructure, and a runtime that can safely control dozens of joints and sensors.

Nvidia’s Isaac Groot platform aims to be the reference stack for humanoids:

• Models: Foundation models like Cosmos and task-specific policies.
• Simulation: Isaac Lab and Omniverse for physics-accurate virtual environments.
• Data generation: Teleoperation tools (Isaac Teleop) and synthetic data generation to scale from a few human demos to thousands of training examples.
• Training and evaluation: Isaac Lab Arena for training and testing policies in simulation.
• Runtime: Isaac ROS and the Thor robot computer for deployment on real robots.

To accelerate research, Nvidia is also releasing an Isaac Groot reference humanoid: a 6-foot, ~150-pound robot with 31 degrees of freedom (and 25 DoF hands from SHARPA), powered by Thor and the full Isaac software stack. It’s aimed at universities and research labs that want to focus on algorithms and behaviors instead of spending years on hardware integration.

RTX Spark and the reinvention of the PC

One of the boldest parts of the keynote was Nvidia’s claim that the PC is being reinvented for the first time in 40 years. Just as Windows, BIOS, chipsets, and DirectX defined the original PC era, Nvidia and Microsoft are now defining a new architecture for the age of agents.

At the center is RTX Spark, a new class of Windows PCs designed to run agents locally:

• A Blackwell-based RTX GPU with 6,144 CUDA cores and up to 1 petaflop of AI performance.
• A custom 20-core Grace CPU co-designed with MediaTek.
• NVLink fusion between CPU and GPU.
• 128 GB of unified memory on TSMC’s 3 nm process, with around 70 billion transistors.

Because RTX Spark runs the full Nvidia software stack, it can handle everything from gaming and creative apps to digital biology, simulation, and on-device agents. Microsoft and Nvidia have spent years optimizing Windows, drivers, and APIs so that existing apps “just work” while new agentic workloads can run natively.

In one demo, an architect used a local agent on an RTX Spark laptop to design a house: the agent read a prompt and mood board, opened Rhino to model the site, generated building forms, laid out rooms and structure, exported to Blender, and then used a generative model (Flux 2) to create photorealistic renders—while the human designer stayed in the loop to adjust and approve.

Adobe is also re-architecting Photoshop and Premiere for RTX Spark, promising roughly 2x speedups and agent-friendly integration via MCP servers that let agents manipulate creative projects safely.

New Windows desktops, laptops, and workstations

RTX Spark is just the start. Nvidia and Microsoft are rolling out a full line of reimagined Windows machines:

• Laptops based on RTX Spark for mobile creation, gaming, and local agents.
• Desktops that can run personal agents 24/7 without “meter anxiety,” acting as home AI hubs connected to cameras, appliances, and other devices.
• Workstations like a Windows-compatible DGX Station with 20 petaflops of compute, 8 TB/s of memory bandwidth, and 768 GB of memory—enough to train and experiment with trillion-parameter models at your desk.

Nvidia’s vision is that, just as every home now has a TV or home theater, many homes and offices will eventually have an AI supercomputer running personal agents. Over time, these machines may feel less like traditional PCs and more like R2-D2 or C-3PO: ever-present assistants that understand your preferences, history, and environment.

If you’re curious how this fits into the broader 2026 AI hardware and platform wave, it’s worth comparing Nvidia’s PC vision with what Google is doing around Gemini and Android XR, as covered in our preview of Gemini 4, Veo 4, and Android XR at Google I/O 2026.

The big picture: compute is revenue, agents are the new apps

Across all these announcements, a few themes tie everything together:

• Useful AI is here: Agentic AI is already boosting software development, design, and engineering productivity, and it’s starting to reshape industries from chip design to finance and manufacturing.

• Agents are the new application pattern: Model + harness + tools + memory + runtime is the template that will repeat across clouds, enterprises, PCs, cars, robots, and edge devices.

• Compute is now directly tied to revenue: Tokens per watt and time-to-first-token matter more than raw chip price. Architectures that deliver more useful tokens per joule will win.

• CPUs are being rethought: Vera shows that CPUs built for agents—high single-thread performance, huge bandwidth, tight GPU coupling—are different from CPUs built for human-centric workloads.

• Open models plus full-stack integration: With Neotron and Cosmos, Nvidia is betting on open, reproducible models that plug into a deeply integrated hardware and software stack, from data centers to laptops to robots.

For developers, enterprises, and infrastructure providers, the message is clear: the next wave of AI won’t just be about bigger models. It will be about agents that can safely and profitably run across an entire ecosystem of hardware—from trillion-parameter training clusters to the PC on your desk and the robot on your factory floor.