Magnification limit?

Is there a limit on the resolution of texture magnification?

I have a texture that encodes distances for a fairly large region of 5km x 5km. The value read from the texture is then used as input to other lookups.
Now, I noticed that the output of texture() becomes constant over cells of 9.5cm size, a 1/16384 (1/2^14) of texture size (a 256x256 texture). However, when I compute the value manually in shader using bilinear interpolation between 4 values obtained from texelFetch(), I get the correct smooth results.

Tested on Nvidia GTX 460 card.

The spec does not impose any limits on the magnification factor. However, there are probably limitations in the hardware. When you sample a texture, the texture-unit (special hardware) processes this request and does the bilinear filtering. It is very fast, highly optimized, and therefore usually better than computing the bilinear weight in a shader. However, it is very likely that one optimization to make it so fast, is to limit its precision. Therefore what you probably observe is the hardware’s precision limit when interpolating the values, whereas, when you do it manually in the shader, you get the full 32 Bit float precision.

Two “solutions” (well, work-arounds) come to mind.

  1. Do it manually in a shader, as you already did.

  2. Use a higher resolution texture. Each time you increase your texture-size, you should also get more interpolation-precision.

  3. should be slower, but 2) is not a real win, because there will always be some limit (you just push it further out). And on other hardware that limit might be even stricter.

I’d go for 1)

Of course there’s also 3), which means to rethink your whole algorithm.


Thanks Jan.

I’ll probably stick to 1) for now, there’s of course also the possibility to rethink the algorithm but so far it also comes with other complications.

I ran into the same problem using CUDA, and it was exactly as Jan says.

If you have access to CUDA you can get a speedup over your option 1) by interpolating (in the kernel) at block corners (blocks of, say, MxM pixels), cacheing results in shared memory and then interpolating between these per pixel. This relieves the global memory bandwidth required by 1).

Also, CUDA has a new function for 2D textures to grab all 4 neighboring texture pixels (my case was a 3d grid, so that didn’t help).

Not sure how openCL would fare. Not sure if it has shared memory…

Yeah, it’s called “local” memory (spaces: global/constant/local/private). See slide 15 in this presentation for instance:

Good to know, thanks for pointing that out.
I hope to learn OpenCL sometime soon to avoid the proprietary trap.