Benchmarking Nvidia’s RTX Neural Texture Compression tech that can reduce VRAM usage by over 80%

Benchmarking Nvidia's RTX Neural Texture Compression tech that can reduce VRAM usage by over 80%

(Image credit: Tom's Hardware) (Image credit: Tom's Hardware) On the RTX 5060 at 1080p, the performance cost of Inference on Sample is between 0.60-0.70 ms, depending on the scenario. At a resolution that is appropriate for this GPU, we are again within 1 ms.

The 5060 does struggle at higher resolutions, though. At 1440p, the cost is over 1 ms, and at 4K, the cost approaches 2 ms, although this is to be expected for this level of GPU.

Now let’s take a look at lower end system: a laptop with an RTX 4060 Mobile GPU.

(Image credit: Tom's Hardware) (Image credit: Tom's Hardware) The performance cost of the Inference on Sample mode at 1080p on the 4060 Laptop GPU is roughly 0.70-0.85, depending on the scenario.

The cost for the 4060 is getting close to 1 ms. There could still be scenarios where the 4060 with its 8 GB frame buffer could benefit from Inference on Sample. If VRAM is the main limitation, then it may be worth using this mode. As Alexey Panteleev mentions below, if a game forces you to lower the texture quality setting because it would otherwise not fit into VRAM, but the game runs more than fast enough when you do that, then Inference on Sample could be a net benefit.

When I uploaded a couple of NTC videos on the Compusemble YouTube channel in October 2025, Alexey Panteleev – Distinguished DevTech Engineer at Nvidia and NTC developer – generously joined the comments section. He shared additional insight and answered viewer questions.

Alexey Panteleev: Inference on Sample is only viable on the fastest GPUs, and that's why we also provide the Inference On Load mode that transcodes to BCn and only provides disk size or download size reduction, not VRAM benefits.

Whether a GPU will be fast enough for inference on sample depends mostly on the specific implementation in a game. Like, whether they use material textures in any pass besides the G-buffer, how complex their material model is and how large the shaders are, etc. And we're working on improving the inference efficiency.

Alexey Panteleev: Our thinking is that games could ship with NTC textures and offer a mode selection, On Load/Feedback vs. On Sample, and users could choose which one to use based on the game performance on their machine. I think the rule of thumb should be – if you see a game that forces you to lower the texture quality setting because otherwise it wouldn't fit into VRAM, but when you do that, it runs more than fast enough, then it should be a good candidate for NTC On Sample.

Another important thing – games don't have to use NTC on all of their textures, it can be a per-texture decision. For example, if something gets an unacceptable quality loss, you could keep it as a non-NTC texture. Or if a texture is used separately from other textures in a material, such as a displacement map, it should probably be kept as a standalone non-NTC texture.

Alexey Panteleev: On Sample mode is noticeably slower than On Load, which has zero cost at render time. However, note that a real game would have many more render passes than just the basic forward pass and TAA/DLSS that we have here, and most of them wouldn't be affected, making the overall frame time difference not that high. The relative performance difference between On Load and On Sample within the same GPU family should be similar. If a GPU runs out of VRAM, On Load wouldn't help at all, because it doesn't reduce the working set size, and uploads over PCIe only happen when new textures or tiles are streamed in.

When the NTC sample was first released last year some people noticed that the image contained a lot of noise when anti-aliasing was turned off. This noise was cleaned up entirely when using DLSS, and mostly cleaned up when using TAA but not entirely. This is due to the use of STF. When disabling STF, we no longer noticed any noise in the image with AA disabled. However, STF is required for Inference on Sample.

Alexey Panteleev: Also note that STF (Stochastic Texture Filtering) plays a major role in how things with detailed specular reflections look, like the curtains. Here, you can toggle STF on or off in the Reference and On Load modes, but not On Sample – that one requires STF and it's always on. STF is on by default in all modes, to make the comparison more direct.

The sample tested here offers a fascinating glimpse into the future of graphics rendering. Neural Texture Compression (NTC) can offer extremely large compression ratios without sacrificing image quality – and in fact, appears to offer better image quality than block-compressed formats in some scenarios.

It is very impressive that the Inference on Sample mode produced slightly better image quality than the BCn transcoded textures in the Intel Sponza base scene, while at the same time reducing texture memory by 85%. The Inference on Sample mode was almost a perfect match for the reference (uncompressed) materials.

That said, some caveats remain. Stochastic Texture Filtering (STF) introduces visible noise when anti-aliasing is completely disabled, and some residual noise can still appear even when using Temporal Anti-Aliasing (TAA). NTC currently requires DLSS to look its best when using STF, which is mandatory for Inference on Sample.

The compatibility of this technology across a wide range of GPUs also stood out. Developers can compress textures using NTC, but also offer an Inference on Load mode, which transcodes the NTC textures to BCn during game or map load. While this will not shrink VRAM usage, it has zero cost to performance and will greatly lower the footprint of games on disk. The technology is also supported on AMD and Intel GPUs.

Neural Texture Compression is poised to play a crucial role in the future of real-time graphics, and it will be exciting to see how it evolves and matures over time.

Dan Mateescu is a PC enthusiast with many years of experience benchmarking PC hardware. In 2021, he started his own YouTube channel called 'Compusemble' where he benchmarks hardware in video games and the latest tech demos. ","collapsible":{"enabled":true,"maxHeight":250,"readMoreText":"Read more","readLessText":"Read less"}}), "https://slice.vanilla.futurecdn.net/13-4-20/js/authorBio.js"); } else { console.error('%c FTE ','background: #9306F9; color: #ffffff','no lazy slice hydration function available'); } Dan Mateescu Social Links Navigation Contributor Dan Mateescu is a PC enthusiast with many years of experience benchmarking PC hardware. In 2021, he started his own YouTube channel called 'Compusemble' where he benchmarks hardware in video games and the latest tech demos.

Pierce2623 I’m guessing right now that since there is a performance penalty it won’t be nearly as useful on the 8GB cards where it’s really needed, just like frame gen. Reply

bit_user Thanks for looking into this! I'd also be curious to know what the impact on power is like, since it seems to me that one of the main tradeoffs of NTC is that it trades more computation for less memory utilization. If I'm right, that could also have an impact on frame rates, like if the GPU becomes more prone to power or thermal throttling. Reply

PEnns "Alexey Panteleev: Inference on Sample is only viable on the fastest GPUs," Funny! Because "the fastest GPUs" already have enough VRAM and don't really need this technology. This NTC technology might help those with less VRAM, aka, not "the fastest GPUs", but they will not see much advantage using it!! So, what now?? Reply

Benchmarking Nvidia’s RTX Neural Texture Compression tech that can reduce VRAM usage by over 80%

Key considerations

Reference reading

More on this site

Leave a Comment Cancel reply

Key considerations

Reference reading

More on this site

Related posts:

Leave a Comment Cancel reply