nvidia instancing demo

I was wondering if I should use instancing for a certain situation I have so I looked into nvidia’s demo.
Fortunately, these guys have coded both a GLSL and ARB_vp/fp version in their demo.

The non-instance GLSL version runs slow as hell.
The non-instance ARB_vp/fp is pretty good, approaching the FPS of the instanced version of GLSL and ARB_vp/fp

The instanced version of GLSL and ARB_vp/fp are close.

So, I guess this tells us that there is some major CPU work being done when glUniform is called?

And I haven’t benchmarked, but what if I have an array of uniforms
uniform vec4 SOMETHING[100];

If I update the entire array in 1 call, I hope it’s quick!

Hi V-man,

Would you mind posting a link to the demo? All I could find was an OpenGL pseudo-instancing sample.

N.

I think you were looking at the right thing.
http://http.download.nvidia.com/developer/SDK/Individual_Samples/3dgraphics_samples.html

It’s titled “Pseudo Instancing”
Run it, right click to get the context menu and flip between GLSL and ARBvp/fp to see the difference.

increased number of instances twice, increased mesh resolution twice, 1280*960, geforce 6800LE :
18.8 fps with pseudo instancing GLSL, or ARB programs
9 fps with ARb, no instancing
6.5 fps with GLSL, no instancing

nVidia’s glslang implementations are well known for doing stupid things when you modify a uniform. In some cases, they have been known to recompile the shader because of it.

I really wish they wouldn’t do that.

I have found that instancing gives about a 30% speed increase over batched vertex buffers, all other things aside.

That’s the thing to notice. GLSL non instanced being slower than ARBvp/fp non instanced. This seems to be the case for both vendors.

With glProgramParameterfARB or whatever the API, it gets to the GPU registers quicker. With glUniform, the driver is doing some work to figure out to which registers to write to.
I think this is because GLSL vs and fs share uniform space.

On my GeForce 8600 with 169.21 drivers, the demo wrong run… I get an error tell me unable to find and overloaded function for max(float, int). There is a warning on matrices cast… well I fixed it and here are my results:

  • Between 55 to 57 FPS with instancing and GLSL
  • Between 55 to 57 FPS with instancing and ARB program.
  • 22.3 without instancing with a GLSL program
  • 24.7 without instancing with a ARB program

Seams that speudo instancing isn’t really “stable”. It a bit more effective with ARB program.

I don’t think that uniform variable share the space with GLSL. The size isn’t even the same between FS and VS. 4096 components on vertex shader and 2048 components on fragment shader on my GF8600. Anyway, now with true instancing and texture buffer we should get really higher performance without uniform troubles.

The concept behind GLSL is that you build a program.
Your VS and FS become one.
If you have some uniform called myposition in your VS and the same thing in your FS, you just get 1 uniform location.
So when you set the value for myposition with a call to glUniform, the driver does a check to see if this thing exists in the VS or does it exist in FS or both and so it does the update.

I’m assuming in this case VS and FS are separate entities in the GPU.

EXT_draw_instanced is definitely the way to go, if you’re fortunate enough to have a G80 or better.

Now that be a nice API :cool:

Just add

#extension GL_EXT_gpu_shader4 : enable

and you’re off to the races.