
(Image credit: Tom's Hardware) (Image credit: Tom's Hardware) On the RTX 5060 at 1080p, the performance cost of Inference on Sample is between 0.60-0.70 ms, depending on the scenario. At a resolution that is appropriate for this GPU, we are again within 1 ms.
The 5060 does struggle at higher resolutions, though. At 1440p, the cost is over 1 ms, and at 4K, the cost approaches 2 ms, although this is to be expected for this level of GPU.
Now let’s take a look at lower end system: a laptop with an RTX 4060 Mobile GPU.
(Image credit: Tom's Hardware) (Image credit: Tom's Hardware) The performance cost of the Inference on Sample mode at 1080p on the 4060 Laptop GPU is roughly 0.70-0.85, depending on the scenario.
The cost for the 4060 is getting close to 1 ms. There could still be scenarios where the 4060 with its 8 GB frame buffer could benefit from Inference on Sample. If VRAM is the main limitation, then it may be worth using this mode. As Alexey Panteleev mentions below, if a game forces you to lower the texture quality setting because it would otherwise not fit into VRAM, but the game runs more than fast enough when you do that, then Inference on Sample could be a net benefit.
When I uploaded a couple of NTC videos on the Compusemble YouTube channel in October 2025, Alexey Panteleev – Distinguished DevTech Engineer at Nvidia and NTC developer – generously joined the comments section. He shared additional insight and answered viewer questions.
Alexey Panteleev: Inference on Sample is only viable on the fastest GPUs, and that's why we also provide the Inference On Load mode that transcodes to BCn and only provides disk size or download size reduction, not VRAM benefits.
Whether a GPU will be fast enough for inference on sample depends mostly on the specific implementation in a game. Like, whether they use material textures in any pass besides the G-buffer, how complex their material model is and how large the shaders are, etc. And we're working on improving the inference efficiency.
Alexey Panteleev: Our thinking is that games could ship with NTC textures and offer a mode selection, On Load/Feedback vs. On Sample, and users could choose which one to use based on the game performance on their machine. I think the rule of thumb should be – if you see a game that forces you to lower the texture quality setting because otherwise it wouldn't fit into VRAM, but when you do that, it runs more than fast enough, then it should be a good candidate for NTC On Sample.
Another important thing – games don't have to use NTC on all of their textures, it can be a per-texture decision. For example, if something gets an unacceptable quality loss, you could keep it as a non-NTC texture. Or if a texture is used separately from other textures in a material, such as a displacement map, it should probably be kept as a standalone non-NTC texture.
Alexey Panteleev: On Sample mode is noticeably slower than On Load, which has zero cost at render time. However, note that a real game would have many more render passes than just the basic forward pass and TAA/DLSS that we have here, and most of them wouldn't be affected, making the overall frame time difference not that high. The relative performance difference between On Load and On Sample within the same GPU family should be similar. If a GPU runs out of VRAM, On Load wouldn't help at all, because it doesn't reduce the working set size, and uploads over PCIe only happen when new textures or tiles are streamed in.
When the NTC sample was first released last year some people noticed that the image contained a lot of noise when anti-aliasing was turned off. This noise was cleaned up entirely when using DLSS, and mostly cleaned up when using TAA but not entirely. This is due to the use of STF. When disabling STF, we no longer noticed any noise in the image with AA disabled. However, STF is required for Inference on Sample.
Alexey Panteleev: Also note that STF (Stochastic Texture Filtering) plays a major role in how things with detailed specular reflections look, like the curtains. Here, you can toggle STF on or off in the Reference and On Load modes, but not On Sample – that one requires STF and it's always on. STF is on by default in all modes, to make the comparison more direct.
The sample tested here offers a fascinating glimpse into the future of graphics rendering. Neural Texture Compression (NTC) can offer extremely large compression ratios without sacrificing image quality – and in fact, appears to offer better image quality than block-compressed formats in some scenarios.
It is very impressive that the Inference on Sample mode produced slightly better image quality than the BCn transcoded textures in the Intel Sponza base scene, while at the same time reducing texture memory by 85%. The Inference on Sample mode was almost a perfect match for the reference (uncompressed) materials.
That said, some caveats remain. Stochastic Texture Filtering (STF) introduces visible noise when anti-aliasing is completely disabled, and some residual noise can still appear even when using Temporal Anti-Aliasing (TAA). NTC currently requires DLSS to look its best when using STF, which is mandatory for Inference on Sample.
The compatibility of this technology across a wide range of GPUs also stood out. Developers can compress textures using NTC, but also offer an Inference on Load mode, which transcodes the NTC textures to BCn during game or map load. While this will not shrink VRAM usage, it has zero cost to performance and will greatly lower the footprint of games on disk. The technology is also supported on AMD and Intel GPUs.
Neural Texture Compression is poised to play a crucial role in the future of real-time graphics, and it will be exciting to see how it evolves and matures over time.
Dan Mateescu is a PC enthusiast with many years of experience benchmarking PC hardware. In 2021, he started his own YouTube channel called 'Compusemble' where he benchmarks hardware in video games and the latest tech demos. ","collapsible":{"enabled":true,"maxHeight":250,"readMoreText":"Read more","readLessText":"Read less"}}), "https://slice.vanilla.futurecdn.net/13-4-20/js/authorBio.js"); } else { console.error('%c FTE ','background: #9306F9; color: #ffffff','no lazy slice hydration function available'); } Dan Mateescu Social Links Navigation Contributor Dan Mateescu is a PC enthusiast with many years of experience benchmarking PC hardware. In 2021, he started his own YouTube channel called 'Compusemble' where he benchmarks hardware in video games and the latest tech demos.
Pierce2623 I’m guessing right now that since there is a performance penalty it won’t be nearly as useful on the 8GB cards where it’s really needed, just like frame gen. Reply
bit_user Thanks for looking into this! I'd also be curious to know what the impact on power is like, since it seems to me that one of the main tradeoffs of NTC is that it trades more computation for less memory utilization. If I'm right, that could also have an impact on frame rates, like if the GPU becomes more prone to power or thermal throttling. Reply
PEnns "Alexey Panteleev: Inference on Sample is only viable on the fastest GPUs," Funny! Because "the fastest GPUs" already have enough VRAM and don't really need this technology. This NTC technology might help those with less VRAM, aka, not "the fastest GPUs", but they will not see much advantage using it!! So, what now?? Reply
bit_user PEnns said: "Alexey Panteleev: Inference on Sample is only viable on the fastest GPUs," That still leaves Inference on Feedback as potentially viable. PEnns said: Funny! Because "the fastest GPUs" already have enough VRAM and don't really need this technology. Well, like when ray tracing or DLSS were first introduced, the first generation of hardware to support them couldn't realize their full potential. If DRAM continues being so expensive, we could see further generations of GPUs that are fairly miserly in the amounts they provide. Or, if iGPU gaming becomes more common, like with the N1X, then such techniques might be needed to reduce memory bandwidth requirements, rather than memory capacity limitations. Reply
TerryLaze PEnns said: So, what now?? If it turns out to be popular enough they will make it hardware accelerated by adding special hardware to upcoming cards. Or you know, release top end cards with less ram for cheaper. Reply
bit_user TerryLaze said: If it turns out to be popular enough they will make it hardware accelerated by adding special hardware to upcoming cards. I doubt it. That would duplicate too much silicon vs. their existing tensor cores. They could probably do things to increase tensor core throughput or simply include more of them. Reply
razor512 I would have liked to have seen benchmarks comparing it to not using any neural texture compression. Everything a video card does, has a performance cost, but proper comparisons are needed to determine the true impact since often you are doing a series of tradeoffs, e.g., DLSS has a performance cost but it has an overall performance boost because it reduces compute time in other areas of the render pipeline. What is needed are tests that compare it on and off on high end, mid range, and low end cards that are starved for VRAM throughput like the RTX 4060. Does a smaller VRAM footprint lead to any time savings anywhere in the render pipeline that can offset some of the compute overhead from the neural compression? Does the benefits and tradeoff change on cards that have slow VRAM (the RTX 4060 and its slower than usual VRAM tends to benefit more from a VRAM overclock compared to other cards with a less crippled memory bus. It would be interesting to look into if a reduced memory footprint will make those deficiencies less harmful to the performance. Reply
usertests PEnns said: Funny! Because "the fastest GPUs" already have enough VRAM and don't really need this technology. This NTC technology might help those with less VRAM, aka, not "the fastest GPUs", but they will not see much advantage using it!! So, what now?? Next-gen consoles will be using NTC, and the consoles will likely have 24-36 GB of memory, up from the current 10-16 GB. A 24 GB PS6 handheld would have over double the memory of the Xbox Series S. They could use NTC to stuff the equivalent of over 100 GB of BCn textures into the memory buffer, delivering higher quality. Or they could use a more typical amount, compress it down, and use the leftover memory for LLMs or other models, for things like NPC interaction. You get smaller install sizes as a side benefit. If current GPUs are supporting NTC on Sample with somewhat competent performance, then GPUs released 5 years from now would be fine. That's probably how long it would take before games start dropping support for PS5 and Xbox Series X/S. Reply
thestryker The most important thing I learned from this article is that NTC doesn't need to be applied to the whole scene. To me that suggests enterprising developers could potentially use this to reduce memory footprint without as much of a performance penalty. Reply
JTWrenn Compression seems to be a great use for AI. Don't change things just do it in a more efficient way. That said, it will be interesting to see how this plays out over the next few gens of GPUs, and how it effects optimization and game making going forward. It could open up higher tiers of quality or just make the next gen of mid grade GPUs much more usable. Side note…gonna suck if they don't get some open source standards on this, which Nvidia is generally horrible at. I hope AMD has a more open solution or that Nvidia opens up a bit on this. For consoles though, this is going to be a huge boon for late game work in console gens and I think will really be used well if it finds it's way into systems. Seems like it would be especially good for something like the Switch 2 for moving from a bigger console to a portable without running into a massive memory bottleneck. Reply
Key considerations
- Investor positioning can change fast
- Volatility remains possible near catalysts
- Macro rates and liquidity can dominate flows
Reference reading
- https://www.tomshardware.com/pc-components/gpus/SPONSORED_LINK_URL
- https://www.tomshardware.com/pc-components/gpus/benchmarking-nvidias-rtx-neural-texture-compression-tech-that-can-reduce-vram-usage-by-over-80-percent#main
- https://www.tomshardware.com
- After jumping 2,200% over the last twelve months, DDR4 spot prices fall 5%, the first decline in nearly a year — DDR5 pricing sees some relief in China channel
- Anthropic's Claude Mythos isn't a sentient super-hacker, it's a sales pitch — claims of 'thousands' of severe zero-days rely on just 198 manual reviews
- Silverstone IceMyst Pro 360 Pro Review: Designed for RAM overclocking
- Valve engineer shocks Linux community with game-changing VRAM hack for 8GB GPUs — breakthrough solution turbocharges gaming by prioritizing VRAM for games while
- Microsoft simplifies Windows Insider program — fewer channels, and switching without wiping your device
Informational only. No financial advice. Do your own research.