Broadcom and OpenAI unveil custom-built Jalapeño inference processor — OpenAI’s first chip is a massive reticle-sized ASIC built in an ultra-fast nine-month dev

Broadcom and OpenAI unveil custom-built Jalapeño inference processor — OpenAI's first chip is a massive reticle-sized ASIC built in an ultra-fast nine-month dev

Meta's multi-billion-dollar Graviton deal highlights intensifying CPU shortages in AI infrastructure

Intel and SambaNova team up on heterogenous AI inference platform — different hardware performs different workloads

The die size of Jalapeño's compute chiplet implies that it packs quite a lot of compute oomph, though, of course, we cannot make performance estimates based on this metric. Yet, it is safe to say that Jalapeño's compute die is considerably bigger than compute dies of other inference accelerators on the market and more resembles processors for AI training. Speaking of processors for AI training, we increasingly see multi-chiplet designs for these workloads as companies like AMD and Nvidia want to pack as much performance as possible. Meanwhile, the fact that OpenAI and Broadcom chose to go with a large compute chiplet possibly indicates that they wanted to reduce latencies by as much as possible.

The companies say the chip reached tape-out in just nine months and is slated for deployment beginning in late 2026, which represents an extremely fast turnaround time in ASIC design. It is unclear whether Broadcom and OpenAI extensively used artificial intelligence to define and then develop Jalapeño, though the companies admitted that they used OpenAI's models to speed up parts of the chip's design and optimization work. Typically, it takes 1.5 – 2 years to design an ASIC from scratch, so AI can shrink the development cycle. Another means to accelerate the design cycle is Broadcom's extensive reuse of its logic across different custom designs to deliver new chips faster than other companies.

It is noteworthy that, according to the announcement, Jalapeño is designed to support not only OpenAI's own workloads but also present and future LLMs across the industry, which potentially lets OpenAI sell its hardware to third parties, assuming that it can get enough supply from Broadcom and TSMC. Meanwhile, the chief executive of Broadcom indicates that Jalapeño will be deployed at gigawatt-scale data centers with Microsoft and other partners starting this year, though it is unclear whether the processor will be used exclusively for OpenAI workloads or will be available for other tenants as well.

"Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI," said Hock Tan, President and CEO, Broadcom. "This is just the beginning of a multi-generation roadmap. By co-developing our industry-leading silicon directly with OpenAI, we are enabling the deployment of gigawatt-scale data centers with Microsoft and other partners beginning in 2026."

Follow Tom's Hardware on Google News , or add us as a preferred source , to get our latest news, analysis, & reviews in your feeds.

Anton Shilov is a contributing writer at Tom\u2019s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends. ","collapsible":{"enabled":true,"maxHeight":250,"readMoreText":"Read more","readLessText":"Read less"}}), "https://slice.vanilla.futurecdn.net/13-4-24/js/authorBio.js"); } else { console.error('%c FTE ','background: #9306F9; color: #ffffff','no lazy slice hydration function available'); } Anton Shilov Social Links Navigation Contributing Writer Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

usertests Based on early testing, Jalapeño will efficiently execute our most important workloads close to the hardware’s theoretical limits. It sounds like it can be configured to run multiple models and they don't need a new ASIC for every model. For example, Taalas made an ASIC with a similar reticle-limit size that could only run Llama 3.1 8B, but with insanely high performance and efficiency. Seeing the outcry from users when their beloved models are killed off, maybe it's fine to have the same model available for a few years, using models baked into ASICs for lower operating costs. Reply

Lieutenant Barclay usertests said: Seeing the outcry from users when their beloved models are killed off, maybe it's fine to have the same model available for a few years Maybe we can start a support group for everyone feeling a loss over their "beloved model". You know, some grief counseling. Problem is "AI" pumpers are stuck in the first 3 stages: denial, anger, and bargaining :LOL: Reply

usertests Lieutenant Barclay said: Maybe we can start a support group for everyone feeling a loss over their "beloved model". You know, some grief counseling. Problem is "AI" pumpers are stuck in the first 3 stages: denial, anger, and bargaining :LOL: https://www.cosmopolitan.com/relationships/a66022416/ai-boyfriend-reddit/https://www.bbc.com/news/articles/crl43dxwwy9ohttps://www.techradar.com/ai-platforms-assistants/chatgpt/im-grieving-openai-has-switched-off-chatgpt-4o-and-angry-users-are-backing-a-keep4o-campaign-to-restore-it Reply

alan.campbell99 Assuming these things get deployed, can they be used for anything else when the bubble pops? Also given how it differs from the 'usual' GPU compute modules will this add even more cost to data center projects? Reply

usertests alan.campbell99 said: Assuming these things get deployed, can they be used for anything else when the bubble pops? I think it's extremely less likely for ASICs to be useful, but not impossible since it doesn't look like it's tied to a single model. If SHTF, tech scavengers could at least harvest useless accelerators for the HBM. But more general purpose parts can be used for local AI. alan.campbell99 said: Also given how it differs from the 'usual' GPU compute modules will this add even more cost to data center projects? Are they not using an industry standard form factor? Reply

DougMcC alan.campbell99 said: Assuming these things get deployed, can they be used for anything else when the bubble pops? Also given how it differs from the 'usual' GPU compute modules will this add even more cost to data center projects? The AI bubble might pop. But the inference demand is not going away. I have uses for at least another quadrillion tokens in mind. So do lots of other people. If this hardware is sold off at a big loss it will still be getting used for inference. Reply

Tiz.io Lieutenant Barclay said: Maybe we can start a support group for everyone feeling a loss over their "beloved model". You know, some grief counseling. Problem is "AI" pumpers are stuck in the first 3 stages: denial, anger, and bargaining :LOL: I know people have had personal reasons to get attached to particular models, but as a professional engineer who integrates llms into products, I can tell you it's a massive pain having to reevaluate every integrated prompt when the underlying model changes. Sonnet 4 was perfectly fine for the sort of classification and categorization we use, but when they discontinued it, we spent many hours retooling and testing just to use the more expensive model. The next forced upgrade will see us moving to self-hosted models for greater stability. Reply

Key considerations

  • Investor positioning can change fast
  • Volatility remains possible near catalysts
  • Macro rates and liquidity can dominate flows

Reference reading

More on this site

Informational only. No financial advice. Do your own research.

Leave a Comment