AMD’s upcoming RDNA 5 GPUs might improve dual-issue execution & use shader units more efficiently — LLVM patch adds new FMA instruction to ease compiling

AMD's upcoming RDNA 5 GPUs might improve dual-issue execution & use shader units more efficiently — LLVM patch adds new FMA instruction to ease compiling

bit_user I'd just point out a couple of things: Needing to handle VOPD3 adds more complexity to the decoders, which might not have been possible in RDNA 3 & 4, without a penalty of some sort (pipeline stages, clock speed, etc.) They had to beef up the X pipeline to handle more instruction types, which costs die area. Running more instructions in parallel increases energy usage and would've come in RDNA 3 & 4 at higher power and/or lower clock speeds. The point is that, while this may seem like "free performance", it's actually not. It's the sort of thing that makes sense to improve IPC, while moving to a smaller process node. It also strikes me as interesting that VOPD is basically the hint of a return to VLIW (although I wouldn't consider dual-issue to be "Very Long"). You're having the compiler explicitly schedule multiple execution resources. That's a first-order characteristic of VLIW. BTW, if anyone is curious, you can see the RDNA 4 ISA, documented here: https://docs.amd.com/v/u/en-US/rdna4-instruction-set-architecture Neither Nvidia nor Intel documents the ISA of their shader cores like that. Nvidia does document PTX, but that's a pseudo-assembly and not the actual hardware ISA. Reply

bit_user Faiakes said: This sounds like driver issue. No, this is lifting some hardware limitations. Faiakes said: Couldn't this apply to rdna 4? No, the fact that they're reporting on LLVM (i.e. the open source compiler infrastructure for RDNA GPUs) is simply giving us a peak at how the ISA of RDNA 5 will differ from RDNA 4. The change they noticed is basically teaching the compiler about the upcoming hardware. Whether or not you do that doesn't change the hardware or its actual capabilities. Reply

Alpha_Lyrae In RT workloads, dual-issue came under vector register pressure, as RT tends to eat up VGPRs and RDNA3/4 had to secure two separate VGPRs for the dual-issue instruction, else hardware will simply refuse to launch it. So, if uArch is low on available VGPRs, dual-issue becomes a very hard ask. It looks like VOPD3 can map to the same input VGPRs for X/Y, conserving a resource that is increasingly coming under heavy pressure in wavefront queues. Ports could be double banked or there could be specialized tags to keep things organized. The other thing I see here is addition of signed and unsigned integer ops, which also allows for dual-issue I32/U32 for the first time; this is most likely the reason the CU is twice as wide as before, going from 64SPs to 128SPs, though INT32 could have been added to the extra FP32-only ALU, I think there were resource limitations (along with the mentioned execution limitations). Combining the WGP into a local CU means all of the caches and registers are now global to the CU instead of just the LDS and GDS. 2x integer ops will be important for neural rendering operations. If there's still a WGP, then this is also 2x wider at 256SPs or 8x SIMD32s. Or, there's a chance AMD also moved to SIMD64 with pseudo-SIMD32 support. Gfx1250 looks to be CDNA5, as it has support for dual-issue 64-bit instructions. Reply

bit_user Alpha_Lyrae said: In RT workloads, dual-issue came under vector register pressure, Are you talking specifically about RDNA 3? Chips & Cheese found that RDNA 4's dynamic register allocation was able to achieve full occupancy in RT workloads: Source: https://chipsandcheese.com/p/dynamic-register-allocation-on-amds Alpha_Lyrae said: as RT tends to eat up VGPRs and RDNA3/4 had to secure two separate VGPRs for the dual-issue instruction, else hardware will simply refuse to launch it. I think it's worse than that. From the RDNA 4 manual's description of VOPD: "The two operations must be independent of each other. This instruction has certain restrictions that must be met – hardware does not function correctly if they are not." Alpha_Lyrae said: there's a chance AMD also moved to SIMD64 with pseudo-SIMD32 support. Why would they go back to Wave64? Reply

usertests -Fran- said: Anything about fixing the chiplet design with RDNA5? Regards. RDNA5 could have single GCDs of different sizes shared between desktop cards, some of the laptop APUs for the first time, and Xbox Helix. With other things like memory controllers on a different chiplet. K0B08iCFgkk Reply

AMD’s upcoming RDNA 5 GPUs might improve dual-issue execution & use shader units more efficiently — LLVM patch adds new FMA instruction to ease compiling

Key considerations

Reference reading

More on this site

Leave a Comment Cancel reply

Key considerations

Reference reading

More on this site

Related posts:

Leave a Comment Cancel reply