So slow

Peixinho · January 18, 2012, 2:07am

Hi there, just wanted to ask for some help, because this glsl code turns out to be so slow, is there any other (faster) way to do the same thing, basically it looses 200fps per texture added


[VERTEX]
uniform mat4 uModelViewProjectionMatrix, uNormalMatrix, uModelMatrix, uViewMatrix;
attribute vec2 aPosition, aTexcoord;
attribute vec3 aNormal;
attribute float aHeight;
attribute float aPaintLayer1, aPaintLayer2, aPaintLayer3, aPaintLayer4, aPaintLayer5, aPaintLayer6, aPaintLayer7, aPaintLayer8;
varying vec3 vNormal;
varying vec2 vTexcoord0;
varying float vPaintLayer1, vPaintLayer2, vPaintLayer3, vPaintLayer4, vPaintLayer5, vPaintLayer6, vPaintLayer7, vPaintLayer8;
void main() {
    gl_Position = uModelViewProjectionMatrix * vec4(aPosition.x,aHeight,aPosition.y,1);
    vNormal = (uNormalMatrix * vec4(aNormal,0)).xyz;
    vTexcoord0 = aTexcoord;
    vPaintLayer1 = aPaintLayer1;
    vPaintLayer2 = aPaintLayer2;
    vPaintLayer3 = aPaintLayer3;
    vPaintLayer4 = aPaintLayer4;
    vPaintLayer5 = aPaintLayer5;
    vPaintLayer6 = aPaintLayer6;
    vPaintLayer7 = aPaintLayer7;
    vPaintLayer8 = aPaintLayer8;
};

[FRAGMENT]
uniform sampler2D uLayerTex0, uLayerTex1, uLayerTex2, uLayerTex3, uLayerTex4, uLayerTex5, uLayerTex6, uLayerTex7;                                        
varying vec3 vNormal;
varying vec2 vTexcoord0;
varying float vPaintLayer1, vPaintLayer2, vPaintLayer3, vPaintLayer4, vPaintLayer5, vPaintLayer6, vPaintLayer7, vPaintLayer8;
void main() {
    vec4 color1 = texture2D(uLayerTex0,vTexcoord0.st)*vPaintLayer1;
    vec4 color2 = texture2D(uLayerTex1,vTexcoord0.st)*vPaintLayer2;
    vec4 color3 = texture2D(uLayerTex2,vTexcoord0.st)*vPaintLayer3;
    vec4 color4 = texture2D(uLayerTex3,vTexcoord0.st)*vPaintLayer4;
    vec4 color5 = texture2D(uLayerTex4,vTexcoord0.st)*vPaintLayer5;
    vec4 color6 = texture2D(uLayerTex5,vTexcoord0.st)*vPaintLayer6;
    vec4 color7 = texture2D(uLayerTex6,vTexcoord0.st)*vPaintLayer7;
    vec4 color8 = texture2D(uLayerTex7,vTexcoord0.st)*vPaintLayer8;
    float nDotL = max(0.0, dot(normalize(vNormal), vec3(0.5,0.5,0)));
    vec4 ambient = vec4(0.1,0.1,0.1,0.1);
    vec4 diffuse = vec4(1,1,1,1) * nDotL;
    gl_FragColor = vec4(color1 + color2 + color3 + color4 + color5 + color6 + color7 + color8) * (ambient + diffuse);
};

Thanks in advance

ZbuffeR · January 18, 2012, 4:40am

it looses 200fps per texture added

Wrong. When doing benchmarking, use rendering time (in milliseconds), but never use fps because it is not linear.

Sampling from texture has a cost, so either you reduce the number of texture sampled, or use simpler sampling (ie NEAREST instead of trilinear/aniso etc).
Or you buy better hardware. By the way, what are your video card specs ?

You may want to pack several float xPaintLayerX to vec4 PaintLayer, it may or may not help.

Peixinho · January 18, 2012, 4:46am

Hi there, thanks for you answer.
my video card is a Nvidia 8800GT, I know that its old and there are much better video cards out there, but I’m trying to do this to be able to run on most computers.
I can reduce the number of texture sampled but this is to build a terrain, so the more textures, the more detail.

ZbuffeR · January 18, 2012, 4:55am

Give your timings in milliseconds, for 1,2…8 textures.

menzel · January 18, 2012, 5:53am

Hello,

also the size of the textures and the rendertarget as well as the sampling method (bi-linear vs. tri-linear) could be interesting.

As you’re mixing textures I assume the rendertarget is the same size as the textures and I have observed a significant drop in performance when going from 2k to 4k or even 8k on a 8800 (I also did some mixing of texture layers and below 2k the performance drop while increasing the texturesize was near linear, after a certain size it got worse).

Peixinho · January 18, 2012, 7:04am

I tried to get the time in ms, and their are not linear too, maybe I did something wrong, but here are the results (average values):

1 texture: 10ms
2 textures: 30ms
3 textures: 40ms
…
8 textures: 80ms

The pictures sizes are 256x256 and the sampling method is linear.

I had this problem too when I tried to add multiple lights, for each light, the framerate dropped considerably, it seems that when I try to use several values on the shader, the performance decreases

ZbuffeR · January 18, 2012, 7:29am

Fairly linear to me.
If you keep using vPaintLayer1 for all textures and remove the other *paintlayerN, do you have the same series of times ?

This will help determine if the cost is mostly due to texture sampling, or varying interpolation.

Peixinho · January 18, 2012, 7:35am

It seems to be exactly the same value

ZbuffeR · January 18, 2012, 7:55am

Ok so it is really the cost of texture sampling that has to be reduced.

be sure to disable texture anisotropy
try GL_NEAREST for MIN and MAG
try GL_NEAREST_MIPMAP_NEAREST for MIN (with proper texture mipmaps
check driver forced settings such as aniso, texture sampling quality, antialiasing, etc
maybe texture atlassing, ie pack your 8 textures 256^2 to a single 512*1024 texture (adapting texcoords too), check if performance is better

I think on older cards it will be hard to keep high framerate with so many textures. What is your desired rendering resolution ? Reducing it will go a long way to speed up rendering.

menzel · January 18, 2012, 8:18am

Ok, for a 8800 this looks quite slow.
I just looked up my old numbers: I mixed textures with Photoshop-style blendmodes and the rendertarget was of course the same size as the textures. Nearest filtering was used but afaik the sampling on a G80 uses one clock cycle in average for each mode exept tri-linear, which uses 2.

So:
4 textures with 256^2 took < 2ms
4 textures with 512^2 took < 3ms
4 textures with 1024^2 took ~ 8ms

More than 4 textures were handled by multiple passes, so i don’t have numbers for 8 textures in one pass, sorry.

2048^2 was ~3 ms for 3 textures but ~11 ms for 4, even 2 layers at 4k^2 took already >45ms.

With 4 layers per pass and 2 rendertargets in a ping-pong manner i could mix 14 layers of 1024^2 in 40ms (5 passes iirc).

Same GPU, Win XP, 640MB Ram, RGBA, 8 bit per channel.

So if you’re as well using a Desktop GPU, 40ms for such small textures seems odd to me.

Peixinho · January 18, 2012, 8:28am

Thanks for your answers, I will post the results in a couple of hours

And then I will post my multiple light shader, to get some advice about it too, if you don’t mind.

Peixinho · January 18, 2012, 10:45am

Well the mipmap solved it, its really fast now, thanks a lot ZbuffeR and menzel.

I will post my multiple light shader to check if you can help me on this case also:


const int MAX_LIGHTS = 8;
attribute vec3 aPosition, aNormal;
attribute vec2 aTexcoord;
uniform mat4 uModelViewProjectionMatrix, uNormalMatrix, uModelMatrix, uViewMatrix;
uniform float uNumberDirectionalLights;
uniform vec3 uCameraPosition;
uniform vec3 uDirectionalLightsPosition[MAX_LIGHTS];
varying vec4 vPosition;
varying vec3 vNormal, vCameraPosition;
varying float vNumberDirectionalLights;
varying vec3 vDirectionalLightsDirection[MAX_LIGHTS];
void main() {												
    vNormal = (uViewMatrix * uModelMatrix * vec4(aNormal,0)).xyz;                                                
    gl_Position = uModelViewProjectionMatrix * vec4(aPosition,1);
    vPosition = uViewMatrix * uModelMatrix * vec4(aPosition,1);                                                
    vCameraPosition = (uViewMatrix * vec4(uCameraPosition,1)).xyz;
    vNumberDirectionalLights = uNumberDirectionalLights;                                          
    for (int i=0;i<MAX_LIGHTS;i++)
    {
        if (i<int(vNumberDirectionalLights)) {
            vDirectionalLightsDirection[i]=(uViewMatrix * vec4(uDirectionalLightsPosition[i],0)).xyz;
        }
    }
};

int round(float number) { return int(sign(number)*floor(abs(number)+0.5)); }
const int MAX_LIGHTS = 8;
varying vec4 vPosition;
varying vec3 vNormal, vCameraPosition;
varying float vNumberDirectionalLights;
varying vec3 vDirectionalLightsDirection[MAX_LIGHTS];
uniform vec4 uAmbientLight;
uniform vec3 uKa, uKe, uKd, uKs;
uniform float uShininess;
uniform vec4 uDirectionalLightsColor[MAX_LIGHTS];
void main() {
    vec3 P = vPosition.xyz;
    vec3 N = normalize(vNormal);
    vec3 ambient = (vec4(uKa,1) * uAmbientLight).xyz;                                            
    vec3 V = normalize(vCameraPosition - P);                                            
    vec4 finalColor;
    vec3 diffuse;
    vec3 specular;
    for (int i=0;i<MAX_LIGHTS;i++)
    {
        if (i<round(vNumberDirectionalLights)) {
            vec3 L = normalize(vDirectionalLightsDirection[i]);
            float diffuseLight = max(dot(N,L),0.0);
            vec3 H = normalize(L + V);
            float specularLight = pow(max(dot(N,H),0.0),uShininess);
            diffuse += uKd * uDirectionalLightsColor[i].xyz * diffuseLight;
            specular += uKs * uDirectionalLightsColor[i].xyz * specularLight; 
        }
    }
    gl_FragColor=vec4((uKe + ambient + diffuse + specular).xyz,1);
}

Its a shame that I can only send 8 multiple lights, don’t understand why, on my mac with a lower video card I can send 16 , but the main problem here is again, the slow performance.

Thanks in advance

ZbuffeR · January 18, 2012, 2:43pm

Ah, mipmapping, indeed not using it kills performance for any minification. As your textures size was very small, I did not think it could make a difference

Ilian_Dinev · January 18, 2012, 3:35pm

The limit to 8 or 16 lights is because of:
varying vec3 vDirectionalLightsDirection[MAX_LIGHTS];
Generally, the driver is required to support 16 vec4 varyings; it just happens that the implementation/hw on your mac supports more. So, make the fragment-shader calculate the remaining 8+ lights .

Peixinho · January 18, 2012, 3:59pm

thanks for your reply llian Dinev, but the real problem here is the low performance, even with 4 lights, I must have something wrong in here

Peixinho · January 20, 2012, 9:45am

Any suggestion?

Ilian_Dinev · January 20, 2012, 1:51pm

First, remove the lighting stuff from your shader; starting from:
[code
for (int i=0;i<MAX_LIGHTS;i++)
{
if (i<round(vNumberDirectionalLights)) {
vec3 L = normalize(vDirectionalLightsDirection[i]);


if there's significant perf improvement, remove this:
if (i&lt;round(vNumberDirectionalLights)) {

That round() stuff can be improved a lot, anyway: make the vNumberDirectionalLights be an int uniform.

Peixinho · January 23, 2012, 2:19am

Hm… it seems to be a little bit faster with your suggestions, but still, the speed decreases a lot, when a light is added. Maybe there is no way
Thanks

system · October 19, 2021, 7:20pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.