Huawei’s Ascend and Kunpeng progress shows how China is rebuilding an AI compute stack under sanctions

Atlas 900 and Ascend supernodes highlight a scaling-first approach as Huawei trades per-chip efficiency for system-level throughput.

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works .

(Image credit: Getty Images) Share Share by: Copy link Facebook X Whatsapp Reddit Flipboard Share this article Join the conversation Follow us Add us as a preferred source on Google Huawei used its New Year message to highlight progress across its Ascend AI and Kunpeng CPU ecosystems, pointing to the rollout of Atlas 900 supernodes and rapid growth in domestic developer adoption as "a solid foundation for computing." The message arrives as China continues to accelerate efforts to replace Western hardware in critical AI workloads, and as Huawei positions itself as the closest thing the country has to a vertically integrated AI compute vendor.

Huawei’s message offers a snapshot of a strategy that has been unfolding for several years, shaped by U.S. export controls, constrained access to leading-edge manufacturing, and a domestic market increasingly mandated to adopt local silicon . Under those conditions, Huawei’s Ascend and Kunpeng platforms have evolved into something distinct from their Western counterparts: less focused on single-chip supremacy and more on building large, tightly coupled systems that compensate for weaker nodes with scale, networking, and software control.

At the center of Huawei’s AI effort is Ascend, built around its proprietary Da Vinci architecture. The original Ascend 910, introduced in 2019, was manufactured on TSMC’s 7nm process and delivered roughly 256 TFLOPS of FP16 performance at a quoted 350W. That put it in the same broad class as Nvidia’s Volta-era accelerators, though without the same software ecosystem or interconnect maturity.

Huawei's Ascend AI chip ecosystem scales up as China pushes for semiconductor independence

Huawei Ascend NPU roadmap examined — company targets 4 ZettaFLOPS FP4 performance by 2028, amid manufacturing constraints

Sanctions that came in the years following Ascend’s launch significantly changed the playing field, forcing subsequent Ascend generators onto SMIC’s N+1 and N+2 processes, which are roughly comparable to older 7nm-class nodes without EUV. The Ascend 910C , now the backbone of Huawei’s latest clusters, is a dual-die package with two large chiplets combined into a single accelerator card. On paper, Huawei claims up to 780 TFLOPS of BF16 compute, but die area and power efficiency tell a more complicated story.

Huawei suggests the 910C’s combined silicon footprint is around 60% larger than Nvidia’s H100, with lower performance per square millimeter and per watt. In isolation, that would be a losing proposition, but Huawei has leaned hard on interconnects and clustering. The company uses a proprietary high-speed fabric alongside standard PCIe and RoCE networking to bind hundreds or thousands of Ascend accelerators into a single logical training or inference system.

This approach is evident in Huawei’s claims around Atlas 900 and CloudMatrix systems . Rather than competing card-for-card with Nvidia’s H100 or AMD’s MI300X, Huawei emphasizes aggregate throughput. A CloudMatrix 384 system, linking 384 Ascend 910C accelerators, has been positioned as competitive with Nvidia’s large NVLink-based pods on selected workloads, particularly inference. But there’s a trade-off here in terms of physical scale: where Nvidia can deliver multi-exaflop-class FP4 performance in a handful of racks, Huawei requires an order of magnitude more floor space, power delivery, and cooling.

Inference is where Ascend looks strongest, and reports out of China indicate that 910C delivers roughly 60% of H100-class performance on inference tasks, but training remains more challenging .

As for the Atlas 900 supernode, highlighted in Huawei’s New Year message, it is probably best viewed as a piece of architectural showmanship rather than a product that’s likely to come to the Chinese market any time soon. It reflects Huawei’s belief that AI compute can be industrialized through standardized clusters built from domestically controlled components, even if each component lags the global leading-edge.

This is where Huawei’s background in telecom networking comes into play, though. The company has decades of experience building carrier-grade systems that prioritize reliability, deterministic performance, and large-scale orchestration. Ascend clusters apply that mindset to AI, with the emphasis on predictable scaling behavior and integration with Huawei’s own AI frameworks rather than leading benchmarks .

That also explains why Huawei describes the supernode technology as a "more readily accessible" technology for forming a "solid AI computing backbone." Huawei is not pitching Ascend as a drop-in replacement for CUDA, but an alternative stack, from silicon to interconnect to compiler, that customers adopt wholesale. That’s something that could be attractive to Chinese cloud providers that are facing up to some pretty harsh procurement and compliance realities in the face of export restrictions and geopolitical uncertainty.

Huawei Ascend NPU roadmap examined — company targets 4 ZettaFLOPS FP4 performance by 2028, amid manufacturing constraints

Huawei's AI chip capabilities still pale in comparison to American silicon

Key considerations

Investor positioning can change fast
Volatility remains possible near catalysts
Macro rates and liquidity can dominate flows

Reference reading

More on this site

Informational only. No financial advice. Do your own research.

Key considerations

Reference reading

More on this site

Related posts:

Leave a Comment Cancel reply