Intel’s Heracles chip computes fully-encrypted data without decrypting it — chip is 1,074 to 5,547 times faster than a 24-core Intel Xeon in FHE math operations

Intel's Heracles chip computes fully-encrypted data without decrypting it — chip is 1,074 to 5,547 times faster than a 24-core Intel Xeon in FHE math operations

To completely protect data from risks like side attacks, DMA attacks, or hypervisor snooping, Intel has developed a processor that operates on encrypted data without first decrypting it. Now the company has demoed the chip, reports IEEE Spectrum , and Intel claims impressive gains in math operations used to process fully-encrypted data.

Intel introduced and demonstrated its Heracles accelerator featuring fully homomorphic encryption (FHE) — meaning that it ingests encrypted data, processes it, and outputs it in an encrypted format — last month at the International Solid-State Circuits Conference ( ISSCC ). The chip is by no means an x86 CPU. It cannot execute normal software or run an operating system, as it is designed exclusively to accelerate fully homomorphic encryption (FHE) math.

You may like Microsoft promises to nearly double Windows storage performance after forcing slow software-accelerated BitLocker on Windows Intel's make-or-break 18A process node debuts for data center with 288-core Xeon 6+ CPU Intel's roadmap adds mysterious 'hybrid' AI processor featuring x86 CPUs, dedicated AI accelerator, and programmable IP Being a purpose-built chip, when it comes to acceleration of FHE math, the new chip operating at 1.20 GHz is roughly 1,074 to 5,547 times faster than a 24-core Intel Xeon W7-3455 'Sapphire Rapids' running at 2.50 GHz – 4.80 GHz in seven operations used in this type of workload, according to Intel.

From a technical standpoint, Heracles is a sharp departure from conventional CPUs and GPUs, both of which struggle with the mathematical demands of encrypted workloads. FHE math depends on extremely large integers, intensive polynomial calculations, and complex data transformations that quickly overwhelm general-purpose processors. Intel's Heracles relies on a purpose-designed architecture that uses an 8192-way SIMD compute engine composed of 64 tile-pairs (i.e., each tile-pair contains 128 parallel arithmetic lanes) arranged in an 8×8 mesh. Each tile integrates arithmetic units optimized for modular addition, subtraction, multiplication, and specialized butterfly operations that support number-theoretic transforms (NTT) and inverse NTTs.

These NTTs and inverse NTTs are key to encrypted computation but require heavy data movement and tightly coordinated permutations. In addition, the accelerator supports automorphisms and bootstrapping operations to remove accumulated cryptographic noise and enable longer computational chains.

The system-on-chip operates with 32-bit arithmetic slices (i.e., each lane inside TP processes a 32-bit arithmetic slice) to preserve precision and ensure high parallelism, which greatly improves the efficiency of processing encrypted math at scale. However, efficient explicitly parallel execution also requires high memory bandwidth. To that end, the chip is equipped with 48 GB of HBM3 memory using two stacks as well as custom data paths to maximize the internal bandwidth of terabytes per second. The chip further includes 64 MB of internal scratchpad memory, large register files, and dedicated buffers that stage data close to compute engines.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Key considerations

  • Investor positioning can change fast
  • Volatility remains possible near catalysts
  • Macro rates and liquidity can dominate flows

Reference reading

More on this site

Informational only. No financial advice. Do your own research.

Leave a Comment