Phison demos 10X faster AI inference on consumer PCs with software and hardware combo that enables 3x larger AI models — Nvidia, AMD, MSI, and Acer systems demo

Anton Shilov is a contributing writer at Tom\u2019s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends. ","collapsible":{"enabled":true,"maxHeight":250,"readMoreText":"Read more","readLessText":"Read less"}}), "https://slice.vanilla.futurecdn.net/13-4-11/js/authorBio.js"); } else { console.error('%c FTE ','background: #9306F9; color: #ffffff','no lazy slice hydration function available'); } Anton Shilov Social Links Navigation Contributing Writer Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

nichrome I do hope the phison does not dramatically increase time to first token. Reply

DougMcC nichrome said: I do hope the phison does not dramatically increase time to first token. Came to point out the same error but you beat me to it. Reply

gc9 Earlier article on the previous target niche (specializing models with smaller, slower, but private on-premises hardware than specializing in cloud) https://www.tomshardware.com/pc-components/cpus/phisons-new-software-uses-ssds-and-dram-to-boost-effective-memory-for-ai-training-demos-a-single-workstation-running-a-massive-70-billion-parameter-model-at-gtc-2024 Reply

bit_user The article said: Given that Phison's aiDAPTIV+ stack involves an AI-aware SSD (or SSDs) based on an advanced controller from Phison, special firmware, and software Well, embedding it in the SSD controller's firmware is one way to make sure they stay in the value chain. However, I do genuinely wonder how much benefit it gets from being there. SSDs don't have particularly fast CPU cores, and so I'd expect that the caching layer could be implemented purely in software on the host CPU, without much degradation of performance. I guess the main benefit of integrating into the SSD is that it could occur as part of the normal FTL (Flash Translation Layer), rather than needing to introduce an entirely new layer for it. However, there are even some datacenter SSDs that essentially allow the host processor to do all the FTL stuff. Reply

JarredWaltonGPU bit_user said: Well, embedding it in the SSD controller's firmware is one way to make sure they stay in the value chain. However, I do genuinely wonder how much benefit it gets from being there. SSDs don't have particularly fast CPU cores, and so I'd expect that the caching layer could be implemented purely in software on the host CPU, without much degradation of performance. I guess the main benefit of integrating into the SSD is that it could occur as part of the normal FTL (Flash Translation Layer), rather than needing to introduce an entirely new layer for it. However, there are even some datacenter SSDs that essentially allow the host processor to do all the FTL stuff. So there are two use cases for aiDAPTIV. Inference uses the SSD as a cache for computed KV cache values, and in some scenarios can provide massive benefits. Basically, if you are doing inference and run out of VRAM, older KV calculations get dumped and lost. If you have a longer context length, the KV pairs may need to be recomputed, and it's actually much faster to just retrieve them from a fast SSD (at 7~10 GB/s) than to recompute — even on stuff like NVIDIA H200. (I don't think any testing has been done with B200 / B300 yet.) This mostly results in much faster TTFT (time to first token), at times showing over a 10X benefit. DGX Spark for example was shown taking ~40 second for TTFT versus ~9 seconds with aiDAPTIV; Strix Halo showed ~6 seconds with aiDAPTIV and ~36 seconds without. The other use case is for fine-tune training of models. The standard NVIDIA approach is to have everything loaded into GPU memory, so if you want to fine-tune a 70B parameter LLM as an example, you need ~20X that much memory. 1.4 TB would thus require something like eighteen H100, ten H200, eight B200, or five B300 GPUs just to satisfy the VRAM requirements. With aiDAPTIV, the process gets broken up into chunks that can fit within the available GPU memory (though generally I think it still needs at least a 16GB GPU for models that have 8B or more parameters). There's only about a ~10 percent performance loss by going this route, with the benefit being the potential to fine-tune much larger models on relatively modest hardware. At SC25, as a more extreme example, Phison demonstrated fine-tuning of Llama 3.1 405B using just two RTX Pro 6000 GPUs. So that's 192GB of total VRAM, for a task that would normally need around 8 TB of memory. It took about 25 hours per epoch, doing full FP16 precision training, using an 8TB aiDAPTIV SSD for caching. The cost of the server/workstation in this case was around $50,000. Using 8x GPUs per server with NVIDIA AI GPUs would have a cost closer to $4 million (give or take, as NVIDIA isn't exactly forthcoming on pricing). And you could do the training on premises, rather than going the typical cloud route, which a lot of companies are interested in doing. For fine-tuning, aiDAPTIV SSDs are configured as SLC NAND. (It's a firmware thing, where TLC NAND stays in pure pSLC mode.) Instead of ~5,000 program/erase cycles in TLC mode, the NAND gets about 60,000 PE cycles. Combined with overprovisioning, the aiDAPTIV SSDs are rated for 100 DWPD (drive writes per day). Now, if you're wondering if that's even necessary, I wondered the same thing. I did some testing, and using an AI100E 320GB drive and training the Llama 3.2 8B LLM, I saw up to 11 TB per hour of writes to the SSD. Companies and individuals generally won't be doing 24/7 fine-tune training, but in theory you could write about 250 TB per day if you were to do that. A typical 2TB consumer SSD will offer around 0.3 DWPD, or 1200 TBW total over five years. In a worst-case scenario, aiDAPTIV could burn out that sort of SSD in a week! LOL. Do note that this means the drives are much more expensive per TB of capacity, as the 320GB drive uses 2TB of raw TLC NAND, the 1TB drive uses 4TB of NAND, and the 2TB drive uses 8TB of TLC NAND. (And the 8TB models like the AI200E are the equivalent of a 32TB SSD!) Given NAND shortages, prices are all a bit up in the air. Reply

bit_user Hi Jarred, always great to hear from you! Thanks for taking the time to write such a detailed reply! JarredWaltonGPU said: This mostly results in much faster TTFT (time to first token), at times showing over a 10X benefit. DGX Spark for example was shown taking ~40 second for TTFT versus ~9 seconds with aiDAPTIV; Strix Halo showed ~6 seconds with aiDAPTIV and ~36 seconds without. Awesome stats! JarredWaltonGPU said: The other use case is for fine-tune training of models. Very cool! Too bad the article made only a passing mention of training. : ( Did you try giving Anton some nice graphs to include in the article? JarredWaltonGPU said: For fine-tuning, aiDAPTIV SSDs are configured as SLC NAND. (It's a firmware thing, where TLC NAND stays in pure pSLC mode.) Instead of ~5,000 program/erase cycles in TLC mode, the NAND gets about 60,000 PE cycles. Combined with overprovisioning, the aiDAPTIV SSDs are rated for 100 DWPD (drive writes per day). Ah yes, good point. It's rare to see such endurance, other than defunct Optane drives (which topped out at PCIe 4.0) and I think the only others are using Kioxia's XL-Flash. https://www.tomshardware.com/pc-components/ssds/custom-pcie-5-0-ssd-with-3d-xl-flash-debuts-special-optane-like-flash-memory-delivers-up-to-3-5-million-random-iops JarredWaltonGPU said: Now, if you're wondering if that's even necessary, I wondered the same thing. Great use case for write-intensive drives! Lastly, thank you for doing your part to help mitigate The Great DRAM Shortage! …although, I guess it does sort of shift DDR5 demand over to NAND. Reply

JarredWaltonGPU bit_user said: Lastly, thank you for doing your part to help mitigate The Great DRAM Shortage! …although, I guess it does sort of shift DDR5 demand over to NAND. So, I'm not doing much other than the marketing side of aiDAPTIV… but I will say, the NAND shortages and storage shortages in general this year are going to be rough. Like, it's no longer a discussion of what price companies may charge for SSDs, but rather allocation — are there even enough SSDs to satisfy the desires of big companies like Dell, HP, Lenovo, etc? And the answer is that no, there are not enough SSDs this year, based on projections. I believe Phison's CEO (or US president, one of the two) has said the expectation is that demand for all storage this year will be about 15~20 percent higher than manufacturing capacity. There's something like 2 zettabytes of demand and only 1.7 zettabytes of supply, including HDDs. Or maybe it's a different number, but basically the industry will fall about 15% short this year of meeting demand. The problem is that companies can't just spin up more manufacturing. It takes about two years for a NAND production fab to be built, which means if they started now, it would be done in 2028. Several companies didn't invest in more manufacturing capacity a few years back, due to a reduction in demand and lower prices, and that plays into it as well. And HDD companies haven't really put a lot into capex for a while, despite the WD / SanDisk split, which means there's a shortage for enterprise HDD storage as well. So… buckle your belts, because both NAND and DRAM are going to get even more expensive in the coming months. Also: aiDAPTIV fine-tune training 405B on two GPUs (It says Chris Ramseyer, who is my boss and former Tom's alumni, but I'm the writer of that one… and the goofball raising his fist in the air.) Reply

thestryker JarredWaltonGPU said: For fine-tuning, aiDAPTIV SSDs are configured as SLC NAND. (It's a firmware thing, where TLC NAND stays in pure pSLC mode.) Instead of ~5,000 program/erase cycles in TLC mode, the NAND gets about 60,000 PE cycles. Combined with overprovisioning, the aiDAPTIV SSDs are rated for 100 DWPD (drive writes per day). Now, if you're wondering if that's even necessary, I wondered the same thing. All I can think of is all the NAND used in those between the over provisioning and running SLC mode. Of course with >200TB drives coming nothing is going to save the NAND market. Greatly appreciate your insight and the additional details. Pour another one out for 3D XPoint. Reply

SuperPauly I JarredWaltonGPU said: So, I'm not doing much other than the marketing side of aiDAPTIV… but I will say, the NAND shortages and storage shortages in general this year are going to be rough. Like, it's no longer a discussion of what price companies may charge for SSDs, but rather allocation — are there even enough SSDs to satisfy the desires of big companies like Dell, HP, Lenovo, etc? And the answer is that no, there are not enough SSDs this year, based on projections. I believe Phison's CEO (or US president, one of the two) has said the expectation is that demand for all storage this year will be about 15~20 percent higher than manufacturing capacity. There's something like 2 zettabytes of demand and only 1.7 zettabytes of supply, including HDDs. Or maybe it's a different number, but basically the industry will fall about 15% short this year of meeting demand. The problem is that companies can't just spin up more manufacturing. It takes about two years for a NAND production fab to be built, which means if they started now, it would be done in 2028. Several companies didn't invest in more manufacturing capacity a few years back, due to a reduction in demand and lower prices, and that plays into it as well. And HDD companies haven't really put a lot into capex for a while, despite the WD / SanDisk split, which means there's a shortage for enterprise HDD storage as well. So… buckle your belts, because both NAND and DRAM are going to get even more expensive in the coming months. Also: aiDAPTIV fine-tune training 405B on two GPUs (It says Chris Ramseyer, who is my boss and former Tom's alumni, but I'm the writer of that one… and the goofball raising his fist in the air.) So when you reckon us consumer peasants can get our hands on this tech? I've been waiting for GPU prices to drop for hmmm, about 6 years now and haven't had a GPU in that time and would love to locally train, experiment and build with local models. I hate giving money to tech giants for training stuff, something about the ability to do it locally eases my mind, and bank balance. what tech will it be available in, do you know? Laptops, mini PCs desktops or is it available as a separate module like PCIe, m.2 etc? Reply

bit_user SuperPauly said: So when you reckon us consumer peasants can get our hands on this tech? I've been waiting for GPU prices to drop for hmmm, about 6 years now and haven't had a GPU in that time and would love to locally train, experiment and build with local models. Sad to say, if current/recent GPU prices are too rich for your blood, then I'm going to hazard a guess that these SSDs will be, as well. Take a current PCIe 5.0 drive of like 4 TB or 8 TB capacity and add at least a 50% markup. That's if this stuff comes to client M.2 drives. Otherwise, you're looking at a U.2 (or other server form factor) and an even bigger markup. Then, you'll also need a dGPU of like RTX 5080 or RTX 5090 caliber. Training is far more compute-intensive than inferencing. Of course, that's just speculation, on my part. Reply

Phison demos 10X faster AI inference on consumer PCs with software and hardware combo that enables 3x larger AI models — Nvidia, AMD, MSI, and Acer systems demo

Key considerations

Reference reading

More on this site

Leave a Comment Cancel reply

Key considerations

Reference reading

More on this site

Related posts:

Leave a Comment Cancel reply