How Nvidia Uses AI to Design Chips Chipmaker's models design circuits, verify designs, and test new layouts

Published
Reading time
3 min read
Infographic showing Nvidia's chip design flow, highlighting placer, router, and optimization stages.

Nvidia’s chief scientist dreams of telling an AI model to design a new GPU, then skiing for a couple days while the system does the job. He outlined Nvidia’s progress toward that goal and how far it has to go.

What’s new: Bill Dally, who leads roughly 300 researchers at Nvidia, described AI’s growing role in designing the company’s chips in a conversation with his Google counterpart, Jeff Dean, onstage at Nvidia’s GTC conference in mid-March. His examples (starting in the video at around 24 minutes) ranged from a reinforcement learning system that lays out a chip’s building blocks to large language models trained on decades of proprietary documents.

How it works: Nvidia applies AI at five stages of chip design: laying out components, designing arithmetic circuits (components that perform math on binary numbers, like adders and counters), general engineering assistance, verifying finished designs, and exploring novel layouts.

  • Each time Nvidia takes advantage of a new semiconductor manufacturing process — generally to shrink component sizes, which makes it possible to fit more of them per area of silicon — it must redesign around 2,500 to 3,000 small reusable layout blocks, or cells, such as logic gates and memory latches. An AI system called NVCell does this work. NVCell pairs a genetic algorithm that proposes candidate layouts with a reinforcement learning agent that incrementally corrects violations of design rules (for instance, wires placed too close together). The agent receives a reward each time it clears a violation and a small penalty for each step it takes, an incentive to find the shortest path to a clean design. A rule checker flags violations, and the agent learns to fix them. NVCell cuts work that previously occupied eight engineers for roughly 10 months to an overnight run on a single GPU. The results match or exceed human designs with respect to the area each cell occupies, the power it consumes, and how quickly signals propagate through it.
  • Another reinforcement learning system, PrefixRL, designs the microscopic circuits at the heart of GPU arithmetic units. The agent receives a reward when the circuit design meets timing constraints while minimizing the chip area it occupies and the power it draws. The resulting components are “bizarre” configurations that are 20 percent to 30 percent better than human designs, Dally said. For instance, a 64-bit adder (a circuit that sums two binary numbers) designed by PrefixRL occupies 25 percent less chip area than an equivalent design produced by industry-standard chip-design tools.
  • Nvidia built two large language models, ChipNeMo and BugNeMo, for internal use. The team fine-tuned open-weights LLaMA 2 base models (7 billion and 13 billion parameters) on Nvidia’s internal documentation, including the low-level design code for every GPU the company has produced along with the accompanying hardware specifications. A 2023 paper describes three uses: (i) answering engineers’ questions about Nvidia hardware, (ii) generating code snippets in specialized chip-design languages, and (iii) summarizing bug reports. In that work, the domain-adapted models matched or outperformed a general-purpose base model five times their size on a range of chip design tasks.
  • Verification, which confirms that a finished design behaves as intended, is the longest stage. Dally’s team is working to compress it using AI.

Yes, but: Designing a GPU from end-to-end based on a prompt remains a distant goal, Dally said.

Behind the news: AI is not yet designing chips from scratch, but it is making steady progress toward that goal.

  • paper published in April by Verkoran, an AI chip-design startup, described an agentic AI system that, given a 219-word specification, autonomously designed a 1.48 gigahertz RISC-V CPU chip, roughly equivalent to a 2011-vintage Intel Celeron SU2300. The authors validated the design in simulation but did not fabricate it.
  • Last year, researchers at Princeton and Indian Institute of Technology Madras used deep learning and an evolutionary algorithm to generate wireless communications circuits, producing high-performing designs that defied conventional rules of thumb.
  • In 2023, Google described its use of reinforcement learning to arrange components on the surface of its Tensor Processing Units. 
  • Nvidia first highlighted NVCell in 2021; the PrefixRL adder followed in 2022, and ChipNeMo in 2023.

Why it matters: In chip design, the search space is enormous and only thinly covered by human intuition. Nvidia’s report that its reinforcement learning agents produce unusual but measurably superior circuits echoes a broader pattern in which AI solves problems by finding solutions that human engineers would not consider. And the company is using GPUs to train the AI systems that have been designing its next generation of GPUs, so each chip generation both accelerates the design of the next and produces chips better suited to running the tools that helped to design it.

We’re thinking: There’s a considerable distance between “AI helps a junior engineer understand the company’s technology” and “AI designs the next GPU.” Dally’s willingness to temper expectations is refreshing.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox