Meta reveals four new MTIA chips built for AI inference — to be released on a six-month cadence

The four new chips are MTIA 300, 400, 450, and 500. MTIA 300 is already in production for ranking and recommendations training, while 400 is currently in lab testing ahead of data center deployment. MTIA 450 and 500 are targeted at AI inference and are scheduled for mass deployment in early 2027 and later in 2027, respectively. According to Meta's technical blog , from MTIA 300 through to MTIA 500, HBM bandwidth increases 4.5 times, and compute FLOPs increase 25 times. Meta says MTIA 450 doubles the HBM bandwidth of MTIA 400, describing it as “much higher than that of existing leading commercial products,” or, in other words, Nvidia’s H100 and H200. MTIA 500 then adds another 50% HBM bandwidth on top of 450, along with up to 80% more HBM capacity. Indeed, it’s HBM bandwidth and not raw FLOPs that are the main bottleneck during the decode phase of transformer inference, and mainstream GPUs are architected to maximize FLOPs for large-scale pre-training. This means they carry a cost and power overhead that Meta says is unnecessary for inference workloads.

Meta's approach also includes hardware acceleration for FlashAttention and mixture-of-experts feed-forward network computation, plus custom low-precision data types co-designed for inference. MTIA 450 supports MX4, delivering six times the MX4 FLOPs of FP16/BF16, with mixed low-precision computation that avoids the software overhead of data type conversion. In terms of eventual deployment, MTIA 400, 450, and 500 will all use the same chassis, rack, and network infrastructure, meaning each new chip generation drops into the existing physical footprint for easy interchange. It’s this modularity, Meta says, that’s behind MTIA’s roughly six-month chip cadence, which itself is much faster than the industry’s typical one-to-two year cycle. The software stack runs natively on PyTorch, vLLM, and Triton, with support for torch.compile and torch.export so that production models can be deployed simultaneously on both GPUs and MTIA without MTIA-specific rewrites. Meta said it has already deployed hundreds of thousands of MTIA chips across its apps for inference on organic content and ads. All this comes just two weeks after Meta disclosed a long-term, $100 billion AI infrastructure agreement with AMD , suggesting that there’s a broader effort at play to reduce dependence on Nvidia across different parts of Meta’s AI stack while keeping MTIA at the core of inference workloads.

You may like AMD and Meta strike $100 billion AI deal that includes 10% stock deal Microsoft introduces newest in-house AI chip — Maia 200 is faster than other bespoke Nvidia competitors, built on TSMC 3nm with 216GB of HBM3e Elon Musk reveals roadmap with nine-month cadence for new AI processor releases, beating Nvidia and AMD's yearly cadence

Follow Tom's Hardware on Google News , or add us as a preferred source , to get our latest news, analysis, & reviews in your feeds.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Key considerations

Investor positioning can change fast
Volatility remains possible near catalysts
Macro rates and liquidity can dominate flows

Reference reading

More on this site

Informational only. No financial advice. Do your own research.

Key considerations

Reference reading

More on this site

Related posts:

Leave a Comment Cancel reply