slow when using uniform array

I have a problem concerning uniforms in a glsl-shader. Here´s the problem: when i use “uniform transformation[10]” i get 60 fps but when i use “uniform transformation[100]” i only get 35 fps. The more uniforms i use the slowes it goes. I only update the first element in the array so bandwitch shouldn´t be the problem. I setup the shader only once per frame and then render 500 patchs with 2048 triangles each…
Does anyone know whats the problem?

Does it makes a difference if you send only 1 uniform (the one you need to update) and use the 99 others as constants defined in the shader ?

Yes - i want to use instancing. But the problem i described neglates any speed improvment. I only transfer 1 uniform to nail the problem down.

I am not 100% sure, but I think the driver might very well send the whole array if you modify even a single element of it. However, uploading a single uniform array per frame should not cut down your framerate so much. The performance penalty must be somewhere else. Maybe the the vertex shader address space has to be bigger with larger arrays and that slows it down?

If you observe that smaller arrays work better, maybe you can modify your meshes so that each draw call only uses a subset of all your transformations. This would require uploading the array more than once per frame but you might still get an improvement in speed.

Unfortunately using a smaller array and rendering it many times doesn´t solve the performance issue.

I have 2 shaders for 2 renderpaths:
shader with hardware instancing:
#extension GL_EXT_gpu_shader4 : enable

varying vec2 texCoord;

uniform sampler2D Texture1;
uniform vec4 trans[500];

void main()
{
vec4 vert = gl_Vertex;
vec4 transform = trans[gl_InstanceID];

vert.xy *= transform.zw;
vert.xy += transform.xy;

texCoord = vert / (1024.0*4.0);
vec4 texel = texture2DLod(Texture1, texCoord.xy, 0.0);
vert.z += texel.x;

gl_Position = gl_ModelViewProjectionMatrix * vert;
}

without instancing:
#extension GL_EXT_gpu_shader4 : enable

varying vec2 texCoord;
uniform sampler2D Texture1;
uniform vec4 trans;

void main()
{
vec4 vert = gl_Vertex;

vert.xy *= trans.zw;
vert.xy += trans.xy;

texCoord = vert / (1024.0*4.0);
vec4 texel = texture2DLod(Texture1, texCoord.xy, 0.0);
vert.z += texel.x;

gl_Position = gl_ModelViewProjectionMatrix * vert;
}

after updating my video drivers i get more fps when rendering the patchs without instancing 450 times versus 1 time with instancing… how can that be?

If you mean true geometry instancing ala glDrawElementsInstanced, typically I believe you use vertex texture for that, paired with PBOs for pipelined CPU->GPU texture updates.

However, haven’t tried them yet but I wonder if you could get past your perf issue using uniform buffer objects or even the older bindable uniforms, or possibly with the new NVidia bindless extensions: NV_shader_buffer_load.

Personal experience and readings here indicate that basic plain glUniform is pretty slow and CPU intensive, which intuitively makes sense. It’s the “immediate mode” of uniform setting.

You might also try the texture buffer object extension (core in 3.1) as an alternative to uniform arrays for generalized indexed lookups in the instancing context. Haven’t used it yet myself so can’t speak to the perf question but it seems to be in line with that sort of operation.