HBM undergoes major architectural shakeup as TSMC and GUC detail HBM4, HBM4E and C-HBM4E — 3nm base dies to enable 2.5x performance boost with speeds of up to 1

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works .

(Image credit: SK Hynix) Although the performance of high-bandwidth memory (HBM) has increased by an order of magnitude since its inception around a decade ago, many elements have remained fundamentally unchanged between HBM1 and HBM3E. But as the demands for bandwidth-hungry applications evolve, the technology must also change to accommodate them.

In new information revealed at TSMC's European OIP forum in late November, HBM4 and HBM4E will offer four major changes. HBM4 will receive a 2,048-bit interface and base dies produced using advanced logic technologies. Meanwhile, HBM4E will be able to utilize customizable base dies, which can be controlled with custom interfaces. These are dramatic shifts, which will have a big impact sooner than you might think.

HBM4, HBM4E, and C-HBM4E are on track to hit the market in 2026 and 2027, boasting the aforementioned 2048-bit standard interface with data transfer rates of up to 12.8 GT/s. Additionally, the customizable base dies will be able to use advanced logic technologies up to 3nm-class. This offers higher area efficiency, which TSMC claims represents a 2.5 times increase in performance.

HBM4 — whose specification was officially published earlier this year — is the standard that sets the stage for a number of upcoming innovations in the AI and HPC memory market.

Each HBM4 memory stack features a 2,048-bit interface that officially supports data transfer rates of up to 8 GT/s, though controllers from controller specialists like Rambus and HBM4 stacks from leading DRAM vendors already support speeds of 10 GT/s or higher, since implementers want to have some reserve for additional peace of mind.

A stack with a 2,048-bit interface operating at 12 GT/s can deliver bandwidth of 2 TB/s, so an AI accelerator with eight HBM4 stacks will have access to potential bandwidth of 16 TB/s. And 12 GT/s could be just the beginning. Note that Cadence is already offering an HBM4E physical interface (PHY) with 12.8 GT/s support.

Internally, HBM4 doubles concurrency to 32 independent channels per stack (each split into two pseudo-channels), which reduces bank conflicts and raises efficiency throughput under highly parallel access patterns.

HBM4 stacks also support 24 Gb and 32 Gb DRAM devices and offer configurations for 4-Hi, 8-Hi, 12-Hi, and 16-Hi stacks, thus enabling capacities of up to 64 GB, which allows to build accelerators for next-generation AI models with trillions of parameters. Micron expects 64 GB stacks to become common with HBM4E sometime after late 2027, which aligns with Nvidia plans to equip its Rubin Ultra GPU with 1 TB of HBM4E memory.

The electrical specification of HBM4 broadens operating voltages with vendor-specific VDDQ options between 0.679V and 0.963V and VDDC of 0.97 V or 1.07 V, which enables DRAM makers to bin their offerings for efficiency or frequency while maintaining compatibility with the specification. On the security side of things, HBM4 supports directed refresh management (DRFM) to mitigate row-hammer attacks.

Because HBM4 expands its interface to 2,048 bits, it is supposed to have double the I/O contacts compared to previous-generation HBM stacks. Since it was close to impossible to produce a base die with proper routing using DRAM process technologies, memory makers like Micron, Samsung, and SK hynix collaborated with TSMC early on to ensure compatibility with CoWoS packaging technologies and to produce HBM4 base dies using 12FFC or N5 fabrication technologies .

Back then, it was thought that 12FFC would be used for 'regular' HBM4 base dies, which would be integrated with their host processors using advanced 2.5D process technologies, whereas N5 base dies would be used for HBM4 memory, which would then be integrated using direct bonding on logic chips.

At the European OIP 2025 forum, neither TSMC nor its partners mentioned N5-based HBM4 base dies for integration using hybrid bonding or similar technologies, which likely means that the project is not exactly a priority for now.

Potentially, the integration of HBM4 memory stacks on top of a high-performance processor creates significant thermal density, which would make it difficult to cool. It's also possible that hot compute chips can damage hot DRAM devices, and vice versa, but this is merely speculation.

There may also be hybrid-bonded SoIC-X 3D integrations, with stacked HBM4 on top of compute chiplets in development, but their developers do not want to share results just yet.

In any case, HBM4 base dies made by TSMC on its low-power 12FFC or N5 process technologies, as well as custom C-HBM4E base dies produced on TSMC's N3P node use lower voltages (0.8V – 0.75V vs 1.1V in case of HBM3E), and are up to two times more power efficient than base dies of HBM3E memory manufactured using DRAM technologies, according to TSMC.

On the other hand, since HBM4 requires a more sophisticated controller and a larger, more complex PHY compared to HBM3E (15mm^2 vs 11mm^2, according to GUC). HBM4's memory subsystems will be more power hungry than HBM3E subsystems, too. However, due to considerably higher bandwidth, enabled by HBM4, they will be considerably more power and area-efficient than predecessors.

As for IP readiness, GUC has taped out its HBM4 PHY IP on N3P in March 2025. This will be validated with HBM4 memory samples in Q1 2026, when the company will be formally able to claim that it has a silicon-proven and validated HBM4 memory solution. Additionally, the IP will be compatible with all types of CoWoS packaging (-S, -R, -L) and can address a variety of applications. HBM4 memory controllers are available from a range of companies, including Rambus. EDA developers like Cadence, Siemens EDA, and Synopsys.

With the introduction of HBM4's 2,048-bit memory interface, JEDEC members had to slash maximum data transfer rates to 8 GT/s from around 9.4 GT/s supported by HBM3E, which still enables a dramatic bandwidth increase. However, HBM4E is set to push electrical and signaling limits higher by supporting per-pin data rates to 12 GT/s (by refining PHY for better signal margin and jitter control at higher frequencies) and extending total stack bandwidth to around 3 TB/s, while keeping the 2,048-bit interface and 32-channel architecture. As a result, the bandwidth offered by HBM4E stacks will be 2.5X higher compared to HBM3E, and even when area and PHY power are taken into account, HBM4E will be 1.7X more power efficient and 1.8X area efficient, according to GUC.

HBM3E vs HBM4E comparison by GUC Row 0 – Cell 0 HBM3E

Key considerations

Investor positioning can change fast
Volatility remains possible near catalysts
Macro rates and liquidity can dominate flows

Reference reading

More on this site

Informational only. No financial advice. Do your own research.

Key considerations

Reference reading

More on this site

Related posts:

Leave a Comment Cancel reply