China announces CPU-only exascale supercomputer with 47,000 homemade processors, record 2 Exaflops of performance without GPUs — Lingshen super said to use Huaw

Phyzzi blitzkrieg316 said: I bet… TH is very pro china Not in the actual article here they aren't. They are pretty explicit in the end that the claim is both unlikely as it sits and not that great given the project timeline. That said, there's nothing worse about Chinese ambition than there is about, say, USA's ambition. China does have both material and technological resources and even if I agree with the author's doubts on the likelihood of the project coming to fruition exactly as specified, it's not like the SLS was reflective of the final rocket on the first pass either, and there are ample reasons for China to want super compute access that isn't being directed or watched by outsiders so this project is likely to find continuing support as it rolls out. Reading a little more deeply, there's something to be learned about the goals for this project if it's intending to use CPU cores instead of GPU cores. China is clearly interested in calculations with more digits of accuracy but without waiting for results, which probably means either space program calculations or possibly some other detailed simulation (protein folding or other nanoscale simulation that has to account for quantum effects), and they want this to be information that isn't shared, so probably at least somewhat military technology. Since the big push is compute without GPU acceleration and on internally designed and manufactured components, that means they aren't looking at AI applications and are almost certainly wary of backdoors and hardware espionage loopholes. However, since they also are suggesting using x86 architecture (which they certainly know involves some pain in switching fabs producing chips for mobile architecture), it seems likely that there is existing software that they want to run on this machine that won't do well even on 64 bit mobile processors. Finally the fact that China is announcing their *intent* to make this clearly military-use system rather than waiting for it to be complete means they are responding to something specific: my bet is the smooth completion of the Artemis II mission. This would fit with some other recent posturing China has done regarding their space program. So yeah, the title is overly "gentle" on the Chinese propaganda machine, but the article has some useful information in it and there is even more if you stop to think about it. Reply

bit_user I would just point out that Fujitsu made the Fugaku supercomputer, which was the fastest supercomputer in the world from 2020 to 2022, without using GPUs. They also used their own self-designed ARM cores, implementing SVE at 512-bits, for a total of 0.44 Exaflops (i.e. using the official Top 500 fp64 RMax). It was also quite efficient. So, it can definitely be done. Reply

Phyzzi bit_user said: I would just point out that Fujitsu made the Fugaku supercomputer, which was the fastest supercomputer in the world from 2020 to 2022, without using GPUs. They also used their own self-designed ARM cores, implementing SVE at 512-bits, for a total of 0.44 Exaflops (i.e. using the official Top 500 fp64 RMax). It was also quite efficient. So, it can definitely be done. I don't think there's so much a technical question as to whether this is, broadly speaking, possible. For me the questions are 1) why announce now before seemingly securing the project (they don't appear to have processors currently in the pipeline or necessarily even designed) and 2) why specifically plan to use x86 (which has some known issues and inefficiency compared to ARM for legacy compatibility reasons). Maybe those issues will be sidestepped somehow, or maybe there are already more things in place than seem to be, but with China especially I find state announcements of new and planned technology to be worth looking at through the lense of "Why this and why now?". While I imagine that a real project will move forward, I also have no doubts that the announcement is propaganda and would certainly hesitate to assume it will achieve all the stated goals or specs. Reply

qxp So if they actually did that this is pretty smart. The idea of using CPU/GPU combo comes with a handycap – you need to shuffle data between CPU and GPU for adaptive algorithms. What you want instead is a capable CPU with a vector unit, similar to Xeon Phi and modern AMD/Intel CPUs but with a lot more memory bandwidth. Strix Halo is a small step in the right direction. Compared to conventional CPU/GPU combo the high-bandwidth CPU and vector unit can offer a possibility of large algorithmic improvements. Reply

bit_user Phyzzi said: I don't think there's so much a technical question as to whether this is, broadly speaking, possible. The article seems to cast doubt on it. That's the main reason I cited Fugaku. Phyzzi said: For me the questions are 1) why announce now before seemingly securing the project If you look at every US-based supercomputer, there were announcements made long before even the chips for it were being manufactured. Such an announcement serves a very practical purpose of informing researchers who might want to run jobs on it. They need some guideposts, so they can start tuning, porting, and optimizing their code for it. There are other reasons you might want to announce it, but I won't venture into such a realm of speculation. Phyzzi said: 2) why specifically plan to use x86 (which has some known issues and inefficiency compared to ARM for legacy compatibility reasons). Maybe hedging their bets? Or, maybe just throwing some funding towards Zhaoxin, in order to support development of more competitive x86 cores. Reply

bit_user qxp said: The idea of using CPU/GPU combo comes with a handycap – you need to shuffle data between CPU and GPU for adaptive algorithms. The data needs to get shuffled between nodes, anyhow. Or else, why even have a supercomputer? As far as data movement goes, a switched fabric like NVLink scales far better than using CPU-centric PCIe. qxp said: Compared to conventional CPU/GPU combo the high-bandwidth CPU and vector unit can offer a possibility of large algorithmic improvements. It would have to be something well beyond the realm of standard Linpack, because GPUs have dominated that space for the past 1.5 decades. Intel tried to pitch Xeon Phi in a similar way to how you're saying. It didn't work out very well, however. Reply

qxp bit_user said: The data needs to get shuffled between nodes, anyhow. Or else, why even have a supercomputer? As far as data movement goes, a switched fabric like NVLink scales far better than using CPU-centric PCIe. I meant on much smaller scale, such as within tight loops. bit_user said: It would have to be something well beyond the realm of standard Linpack, because GPUs have dominated that space for the past 1.5 decades. Intel tried to pitch Xeon Phi in a similar way to how you're saying. It didn't work out very well, however. Actually Xeon Phi was great as far as hardware was concerned. Liked it a lot. At the time of release a Xeon Phi gave 1 TFlop of compute, while Nvidia's GeForce gave 2 Tflops. But because Xeon Phi was really just 200+ thread Pentium with a vector unit one can use a smarter algorithm that gave speedup of roughly 10x. So even though I did not have 2 Tflops it was a win. The reason Xeon Phi flopped I suspect is that Intel grossly overpriced it, essentially killing it. For example the 8GB Xeon Phi was around $2000, while the 16GB version was $5000. Surely that extra 8GB memory did not cost $2000+ ? And the 4GB Xeon Phi was mostly useless because for interesting algorithms you need some memory per thread. And then there is a consideration is that at the time the cloud infrastructure companies were selling CPUs per virtual instance, and I bet Intel was afraid that someone would get a decent Xeon Phi and launch 200+ VMs on it, and then why buy Xeon server CPUs? The right way to develop Xeon Phi would have been to stick an Ethernet controller on it so you can plug it directly into the switch and sell 16GB (or even 32GB) version for some reasonable amount of money, say $1600 (GeForce was like $300-500 at the time). Then you essentially get a 1TFlop computer for $1600 and this would have outcompeted all the GPUs. And if you look now, we have a Strix Halo product with decent memory bandwidth, decent RAM size, but can you easily find a box with even a 40Gbit interface? No.. And no notebooks with 15-16" screen either. And its like we released it its out but we really don't want you have it. If you do Strix Halo right it would outcompete H200. and all the MIxx stuff. Reply

SpicyLlama Phyzzi said: Not in the actual article here they aren't. They are pretty explicit in the end that the claim is both unlikely as it sits and not that great given the project timeline. That said, there's nothing worse about Chinese ambition than there is about, say, USA's ambition. China does have both material and technological resources and even if I agree with the author's doubts on the likelihood of the project coming to fruition exactly as specified, it's not like the SLS was reflective of the final rocket on the first pass either, and there are ample reasons for China to want super compute access that isn't being directed or watched by outsiders so this project is likely to find continuing support as it rolls out. Reading a little more deeply, there's something to be learned about the goals for this project if it's intending to use CPU cores instead of GPU cores. China is clearly interested in calculations with more digits of accuracy but without waiting for results, which probably means either space program calculations or possibly some other detailed simulation (protein folding or other nanoscale simulation that has to account for quantum effects), and they want this to be information that isn't shared, so probably at least somewhat military technology. Since the big push is compute without GPU acceleration and on internally designed and manufactured components, that means they aren't looking at AI applications and are almost certainly wary of backdoors and hardware espionage loopholes. However, since they also are suggesting using x86 architecture (which they certainly know involves some pain in switching fabs producing chips for mobile architecture), it seems likely that there is existing software that they want to run on this machine that won't do well even on 64 bit mobile processors. Finally the fact that China is announcing their *intent* to make this clearly military-use system rather than waiting for it to be complete means they are responding to something specific: my bet is the smooth completion of the Artemis II mission. This would fit with some other recent posturing China has done regarding their space program. So yeah, the title is overly "gentle" on the Chinese propaganda machine, but the article has some useful information in it and there is even more if you stop to think about it. IMO, the "why" is because global media has been running that story about their supercomputing center being hacked, and massive amounts of data exfiltrated (probably by Tailored Access Operations). Fits many of the same keywords you'd see in Google, now their botnets and propaganda accounts will start spamming all over western social media. Saving face is the primary concern in Chinese Communist Party culture, such an embarrassment and failure can not be permitted to continue in public discourse. This also fits with similar US computer network operations that happened prior to Russia's invasion of Ukraine, where Tailored Access Operations was again key in obtaining secret information. Exposing and embarrassing adversary governments has become a new tactic utilized by the US Intelligence Community. China just happens to be a particularly juicy target due to the cultural norms. And to pre-answer, yes, I believe the data is real. Reply

zsydeepsky bit_user said: I would just point out that Fujitsu made the Fugaku supercomputer, which was the fastest supercomputer in the world from 2020 to 2022, without using GPUs. They also used their own self-designed ARM cores, implementing SVE at 512-bits, for a total of 0.44 Exaflops (i.e. using the official Top 500 fp64 RMax). It was also quite efficient. So, it can definitely be done. Furthermore, the fastest supercomputer during 2016 & 2017 was China's Sunway Taihu Light which is also built with 40960 China's in-house CPUs only. Personally, the most surprising aspect of this news is that China ANNOUNCES new supercomputers. Since the trade war, China has stopped exposing its supercomputer plans, hiding from the Top500 ranklists, for fear that the US sanctions would interrupt those plans. So the Chinese gov must be very confident that the system can be completely immune to any sanctions or embargoes. Reply

China announces CPU-only exascale supercomputer with 47,000 homemade processors, record 2 Exaflops of performance without GPUs — Lingshen super said to use Huaw

Key considerations

Reference reading

More on this site

Leave a Comment Cancel reply

Key considerations

Reference reading

More on this site

Related posts:

Leave a Comment Cancel reply