OpenAI and Broadcom Reveal Jalapeno: OpenAI's First Custom AI Inference Chip

Started by Patrick94, Yesterday at 01:23 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Topic: OpenAI and Broadcom Reveal Jalapeno: OpenAI's First Custom AI Inference Chip   Views(Read 24 times)

Patrick94

OpenAI and Broadcom unveiled the design of Jalapeño on June 24th, OpenAI's first custom AI chip developed in partnership with the semiconductor giant. The chip is optimised specifically for inference workloads, meaning running trained models to generate responses rather than training new models from scratch. OpenAI has received initial samples and is testing them with deployment planned by end of year. The design targets high efficiency in power and performance for large language model inference, and OpenAI engineers used Broadcom's AI tools during the chip design process itself.

This move is significant because it represents OpenAI reducing its dependence on Nvidia for inference compute. Training still requires Nvidia's H100 and B100 GPUs because they are simply the best available hardware for that workload, but inference is a different problem where custom silicon tuned for specific model architectures can offer substantial efficiency advantages. Every inference request you run costs money and energy, and at ChatGPT's scale of over a billion monthly users even small efficiency gains translate to enormous savings.

The parallel with what Google has done with TPUs is clear. Google spent years and billions building custom inference silicon and it became one of their key competitive advantages in AI infrastructure. Microsoft has been building the Maia inference chip. Amazon has Trainium and Inferentia. OpenAI is the last major AI lab to move toward custom silicon and Jalapeño puts them on the same roadmap as their infrastructure competitors.