
bit_user I'd just point out a couple of things: Needing to handle VOPD3 adds more complexity to the decoders, which might not have been possible in RDNA 3 & 4, without a penalty of some sort (pipeline stages, clock speed, etc.) They had to beef up the X pipeline to handle more instruction types, which costs die area. Running more instructions in parallel increases energy usage and would've come in RDNA 3 & 4 at higher power and/or lower clock speeds. The point is that, while this may seem like "free performance", it's actually not. It's the sort of thing that makes sense to improve IPC, while moving to a smaller process node. It also strikes me as interesting that VOPD is basically the hint of a return to VLIW (although I wouldn't consider dual-issue to be "Very Long"). You're having the compiler explicitly schedule multiple execution resources. That's a first-order characteristic of VLIW. BTW, if anyone is curious, you can see the RDNA 4 ISA, documented here: https://docs.amd.com/v/u/en-US/rdna4-instruction-set-architecture Neither Nvidia nor Intel documents the ISA of their shader cores like that. Nvidia does document PTX, but that's a pseudo-assembly and not the actual hardware ISA. Reply
bit_user Faiakes said: This sounds like driver issue. No, this is lifting some hardware limitations. Faiakes said: Couldn't this apply to rdna 4? No, the fact that they're reporting on LLVM (i.e. the open source compiler infrastructure for RDNA GPUs) is simply giving us a peak at how the ISA of RDNA 5 will differ from RDNA 4. The change they noticed is basically teaching the compiler about the upcoming hardware. Whether or not you do that doesn't change the hardware or its actual capabilities. Reply
Alpha_Lyrae In RT workloads, dual-issue came under vector register pressure, as RT tends to eat up VGPRs and RDNA3/4 had to secure two separate VGPRs for the dual-issue instruction, else hardware will simply refuse to launch it. So, if uArch is low on available VGPRs, dual-issue becomes a very hard ask. It looks like VOPD3 can map to the same input VGPRs for X/Y, conserving a resource that is increasingly coming under heavy pressure in wavefront queues. Ports could be double banked or there could be specialized tags to keep things organized. The other thing I see here is addition of signed and unsigned integer ops, which also allows for dual-issue I32/U32 for the first time; this is most likely the reason the CU is twice as wide as before, going from 64SPs to 128SPs, though INT32 could have been added to the extra FP32-only ALU, I think there were resource limitations (along with the mentioned execution limitations). Combining the WGP into a local CU means all of the caches and registers are now global to the CU instead of just the LDS and GDS. 2x integer ops will be important for neural rendering operations. If there's still a WGP, then this is also 2x wider at 256SPs or 8x SIMD32s. Or, there's a chance AMD also moved to SIMD64 with pseudo-SIMD32 support. Gfx1250 looks to be CDNA5, as it has support for dual-issue 64-bit instructions. Reply
bit_user Alpha_Lyrae said: In RT workloads, dual-issue came under vector register pressure, Are you talking specifically about RDNA 3? Chips & Cheese found that RDNA 4's dynamic register allocation was able to achieve full occupancy in RT workloads: Source: https://chipsandcheese.com/p/dynamic-register-allocation-on-amds Alpha_Lyrae said: as RT tends to eat up VGPRs and RDNA3/4 had to secure two separate VGPRs for the dual-issue instruction, else hardware will simply refuse to launch it. I think it's worse than that. From the RDNA 4 manual's description of VOPD: "The two operations must be independent of each other. This instruction has certain restrictions that must be met – hardware does not function correctly if they are not." Alpha_Lyrae said: there's a chance AMD also moved to SIMD64 with pseudo-SIMD32 support. Why would they go back to Wave64? Reply
usertests -Fran- said: Anything about fixing the chiplet design with RDNA5? Regards. RDNA5 could have single GCDs of different sizes shared between desktop cards, some of the laptop APUs for the first time, and Xbox Helix. With other things like memory controllers on a different chiplet. K0B08iCFgkk Reply
Key considerations
- Investor positioning can change fast
- Volatility remains possible near catalysts
- Macro rates and liquidity can dominate flows
Reference reading
- https://www.tomshardware.com/pc-components/gpus/SPONSORED_LINK_URL
- https://www.tomshardware.com/pc-components/gpus/amds-upcoming-rdna-5-gpus-might-improve-dual-issue-execution-and-use-shader-units-more-efficiently-llvm-patch-adds-new-fma-instruction-to-ease-compiling#main
- https://www.tomshardware.com/subscription
- The Nightmare Returns in the Cloud: GeForce NOW Unleashes Capcom’s ‘Resident Evil Requiem’
- Chinese GPU-maker Lisuan flaunts new design details for its LX 7G100 gaming card – also updates LX GPU product pages with server and workstation specs
- No more stick drift for under $50 — 8BitDo's Pro 3 controller with Hall-Effect TMR sticks and swappable ABXY buttons is 29% off while stocks last
- Chip material prices double as Middle East conflict compounds China's existing gallium export ban — wide range of materials for chipmaking skyrocket as supply c
- Nvidia Groq 3 LPU and Groq LPX racks join Rubin platform at GTC — SRAM-packed accelerator boosts 'every layer of the AI model on every token'
Informational only. No financial advice. Do your own research.