vec4 skinned_position2 = Matrix_Palette[j] * position * 0.5;
gl_Position = Projection * Modelview * skinned_position2;
is still locked at 29.
Thanks for trying to help - but only recommending to disable vsync really isn’t helping me at all…
I realize the additional processing is causing it to miss the vsync and that is why it is 30 fps. My point is that I am doing very little processing in the shader. The fact that an array index causes it to miss the vsync IS the problem. I don’t care that it might cost 10 fps or 30 fps - it is WAY too expensive.
Most importantly, the fact that it only happens on a significantly faster ATI desktop card and not the slower Nvidia mobile chip is further evidence that there is driver issue.
Thanks again for your help and let me know if you have any other ideas other than disabling vsync
No, I understand your point just fine. The simple fact is this: until you turn off vsync, all profiling data you get must be considered suspect. That is, nothing you see about how an application performs can be considered reliable if vsync is enabled.
I know it looks like something odd is going on. But unless vsync is turned off, timing data simply cannot be considered a reliable measure of anything that’s actually happening.
This should not be a difficult thing to do. You could probably have done it in the time it took you to compose that last message.
Sadly, Apple has not implemented ARB_timer_query (part of OpenGL 3.3), so there aren’t very many options available for profiling OpenGL.
That gpu is capable of computing 100s of times more indexed_matrix*vector multiplications than you’re doing. (I’ve run vtx-shader skinning like that on a HD2600, 500k vertices @ 60fps)
So, I’m inclined to think that for whatever reason the driver is doing a software-fallback.
Being able to disable vsync to quickly see vast differences in performance (and thus see sw-fallbacks) is quite important, so do try to find a way .
A quick fix might be: “const int BoneCount = 86;” to become “#define BoneCount 86” , I remember vaguely having some performance issues with such constants on GeForces on some driver version.
Besides frame-timing being useless, subframe timings are totally hosed because of how the GL driver reads ahead for subsequent frames after SwapBuffers, blocking on random calls when some implementation-dependent limits are reached.
Under some drivers you can put a glFinish() after your SwapBuffers call which, on some drivers, will forceably wait for V-sync. This will make your sub-frame timings more useful, but will still leave you with useless frame timings and is probably driver dependent behavior. Using NV_fence/ARB_sync is yet another way to do this.