order of texture sampling

Been a while since I started a topic on here. Anyhoo…

Would it make any difference when sampling textures in a fragment program if you did them like this:

a) sample all textures
b) do all math

or mixing it around like this:

a) sample a texture or two
b) do math with needing those textures
c) sample more textures
d) do more math
e) etc…


Usually I sample when it’s needed like in exhibit-B.

I know when doing vertex texture fetches there is a large amount of latency there so it’s The Right Thing to fetch the vertex texture, do math that doesn’t need the texture result, then do math that uses the texture. This is done to “cover up” the VTF latency.

Just wondering if anyone has came across a problem where this made a difference or not.


My money is on; if you know what the sample is then fetch it early, at worst the fetches are queued. The whole thing is pipelined but it’s the latency of the fetch that hurts the most(and any unpredictability/unoptmizability like dependent reads), anything you can do minimize that latency might help.

Fetching anything late is only going to work against completing a shader before you stall the shader on a fetch. Most of the time I’d bet it makes no difference and optimizers will do the right thing even where it could make a difference (either way).

If that’s NV GLSL compiler, errrm… Cg, then I think they had that operation reorganizing trick mentioned somwhere. Moreover there was a talk about drivers doing rescheaduling operations for fragment/vertex asm sources too (esspeicially for FX generation), I’d say don’t bother. ATI, Matrox … dunno …

Yeah that’s true, the compilers/drivers will probably shift things around to get the best performance knowing what the arch likes and dislikes. I remember looking at ASM outputs of my HLSL shaders and seeing the order at which things were done to be pretty much completely different than what I programmed.

So I guess it’s safe to say just acess textures whenever you feel/need and latency/scheduling issues will be taken care of by compilers/optimizers as best they can.