" In a shader program I changed three uniform variables in the fragment shader for three uniform variables in FS and three varying variables to VS" - i honestly don’t know how you mean that, please try to explain it more clearly.
That performance on a graphics card with PCI-Express is better than on a card with AGP is very possible. However you cannot conclude, that uniforms are the only thing that get faster with PCI-Express.
When the uniform values vary for each PRIMITIVE (e.g. triangle) than uniforms are definitely the wrong thing to use anyway. Use vertex-attributes for such cases, they are MEANT to vary for each primitive, whereas uniforms are meant to stay constant for a BIG batch of geometry (like 100 triangles and more).
If your uniforms vary for each primitive, that means you need to send each primitive with its own drawcall (or even with immediate mode). If so you are using the slowest possible way to render things anyway (and in such cases PCI-Express might actually give you a lot more speed indeed, but it will still be slow).
“Because a uniform variable is sent from the CPU to the GPU for each update.”
Yes, that’s true, at least on “modern” GPUs, as Rob mentioned above.
“If this is true, GLSL performance should be much worse than GC since GLSL have a lot of built-in uniform variables and they are loaded independently if they are read or not… do you follow me? or maybe only are updated the built-in uniforms that will be used?”
With GC you mean nVidias CG i guess? GLSL-compilers (just as the CG compiler) analyze the code for uniform usage. Uniforms that are never used (or used but can be optimized away) will not be provided by the driver (they are pretty smart). You can actually try that out for yourself, just put a uniform into a shader, but don’t access it, at all. In your app query the driver for the uniforms location (glGetUniformLocation i think). It will return -1 (ie. “not existing”) because it optimized the uniform away.
Of course it does also matter how many uniforms you use, that means a shader that accesses 10 uniforms will be faster than one that accesses 50 uniforms. But this is only a minor problem, compared with the time it takes to update those uniforms. That means, if you have a shader that reads 50 uniforms, but those NEVER change, it might actually perform much faster, than a shader that accesses only 10 uniforms, if those uniform change all the time.
Hope that helps you,
Jan.