Slow texture lookup functions

Hello, I 'm doing some per-pixel shading & I’m trying to use functions stored in textures for speeding things up, but the result is exactly the opposite.

I’m using GL_LUMINANCE16F_ARB for the 1D & 2D textures (not texture rectangles) & GL_RGB16F for the norm cubemap. The cubemap is a little faster than the normalize() function w/ dimension up to 256, but slower after that.

Also I upload the textures once into specified texture units & never upload them again. Is there anything wrong in the above logic?

In case that matters, in my shaders I also use 3 textures (up to 1280x1024) for the scene data.

Have you got any idea why things slow down when I use them?? I use a 7800GS AGP. Thanks in advance!!

Sounds like your code is texture fetch limited, probably a good idea to do the reverse : try and limit your texture fetches and move more of your algorithm into code.

Thanks for the answer! The least fetches I can do are 3, & it goes up to 6. Does it matter if the texture fetches are close to each other (in code)? If you’re doing deferred shading, should you doomed by not using textures for functions?

Hardware is moving toward more and more ALU, while TEX isn’t growing nearly as fast. Replacing a normalize() with a cubemap lookup was a decent idea 3-4 years ago, but today it’s likely to be slower, unless you’re TEX limited. Keep in mind that you can often do 3-4 ALU vector instructions in the same time as a bilinear RGBA8 fetch, not counting mini-alus, dual-issue, scalar architectures etc. And normalize() is usually done in three instructions. Texture lookups also get more expensive with more advanced filtering (trilinear multiplies cost with 2x, aniso multiplies with the level of anisotropy) and it also gets more expensive with wider formats (even if they are fully in cache).

Where texture lookups are in code shouldn’t matter too much. It may matter on the hardware level to some degree, but the optimizer will shuffle your lookups around anyway, so there’s not much you can do to help.

hmmm…So I guess really complex functions should go into textures & not some simple ones (like I use)… Is there any way/ place/ program to learn about the instruction count of functions / shaders for glsl code?? Thank you very much again!!

I thought that 7xxx series had a free normalize? Also, floating point textures are more costly then the normal RGBA terxtures.

To your second question: there are some tools both from ATI and Nvidia that will show how much instructions a shader needs, unfortunately I forgot the names. Loot at the Nvidia developer site, you will surely find this program.

free normalize? what do you mean? the 3 instructions humus said? Nvidia has ShaderPerf but the current version doesn’t support glsl shaders (ver. 2.0 Alpha). ATI has one too, I didn’t know…

GF6/7 HW has additional normalization unit which can operate in parallel with other operations. That unit only supports 16bit floats so it is likely not usable without using the Nvidia GLSL hack which adds the half3 type.

You can use the Cg compiler. When run with parameter -oglsl, it compiles the GLSL code. You will not get performance numbers however you can see how the generated code looks.

If it works for 16-bit floats, then would it work with data fetched from RGBA16F textures?? Nvidia GLSL hack?? Could you be more specific? hmm I’ll check it out about the compiler, thank you very much!!!

It depends on type of the value. You can experiment with the Cg compiler using the fp40 profile. If NRMH instruction is generated, that unit is likely to be used.

Nvidia GLSL hack?? Could you be more specific?

You can read about this hack here in the chapter “NVIDIA’s GLSL Enhancements”.

Nvidia GLSL hack?

If you want to use 16-bit floats then put this at beginning of your shader code:

#define hfloat float
#define hvec2 vec2
#define hvec3 vec3
#define hvec4 vec4
#define hfloat half
#define hvec2 half2
#define hvec3 half3
#define hvec4 half4

Now hvec / hfloat type represente 16-bit float on GeForce and 32-bit float on Radeon.

NVIDIA suggests this:

#define half float
#define half2 vec2
#define half3 vec3
#define half4 vec4

This desn’t work on Radeon since it’s compiler says that half, half2-4 are reserved words so you can’t use them in code nor redefine them.

Using 16-bit floats is important if you want your shaders to run faster on GeForce FX (on any other GPU 32-bit floats work fast) or you run into performance problems because of using too much math in your shader.

Ok, thanks for all the information guys, you’ve been very helpful!! I think i’ts time to go & try some of these…

I tried it Humus, but it gives me many errors without apparent reason…In shaders that actually compile & work, of course.

One side question (for not opening a new topic) : How can I obtain the assembly output of the shaders using realtech’s extnsion viewer or NVemulate for example?? (I hope it doesn’t sound stupid)

EDIT : Ok I found them, silly me

Compile & work on Nvidia or ATI cards? Nvidia GLSL compiler is by default very forgiving to incorrect GLSL syntax if it would be valid in Cg. You can force more strict checking using the NVemulate.

You can also check syntax of the shaders using theGLSL Validate tool provided that they are for GLSL version 1.10 or older and do not use GLSL extensions.

on NVidia. for example :
const vec4 normal = texture2D(normalTex,gl_TexCoord[0].st)
is error because
" assigning non-constant to ‘const 4-component vector of float’ "
… this is not strict, I really believe it’s absurd…

That code is incorrect. The GLSL does not allow you to use const keyword in the same way you do in C++.

The only allowed uses are:
Compile time constants

const vec4 foo = vec4( 1.0, 2.0, 4.0, 8.0 ) ;

where the values must be:

Initializers for const declarations must be formed from literal values, other const variables (not including function call paramaters), or expressions of these.


Specification of function parameters

vec4 foo( const in vec4 param )

yes ok GLSL doesn’t allow me, but nvidia’s compiler will, & I find it logical, since I don’t think it’s conceptually wrong. Actually I find the code more readable this way, & the compiler I guess let’s me do it for that reason…