Pass-through vertex shader vs fixed pipeline: SLOW

dukey · April 5, 2011, 8:17am

probably older rendering paths are better optimised

mhagain · April 5, 2011, 8:17am

VBO switching is generally considered a bad thing, but you’re not doing nearly enough of it to have this kind of perf impact IMO. All the same there’s definitely huge room for optimization there (especially with the unbind/rebind thing which is just plain weird) and which likely explains the perf difference you’ve noticed between VBOs and Display Lists. In fact everything here really should be going into a single pair of buffer objects rather than creating multiple objects (especially for such tiny data sets - I see many with a mere 3 elements in there).

My psychic hat tells me that this framework originally drew (and likely still does with GL 1.x) using strips and fans and was converted over to VBOs by just replacing each discrete strip or fan with it’s own VBO. You also tend to see this kind of thing with a certain style of OO code where the developer has implemented everything - down to the finest level of detail - as a class, and has made the design decision that each such class should be totally self contained, walled off, and know nothing at all about the outside world. Not good at all for this kind of use case.

I’m not sure that I understand why 2 SwapBuffers calls seem to be being made per frame. That looks highly dubious - unless the first one is inadvertently logged from the previous frame, of course.

If you’ve got a stencil buffer it really should be cleared at the same time as the depth buffer, because it’s most likely combined with the depth buffer as a single 32-bit depth/stencil buffer (in D24S8 format).

None of this explains the shader performance problem. Maybe try dropping your #version in each shader?

fred_em · April 5, 2011, 9:55am

The two SwapBuffers is a mistake of mine in the cut/paste. There really is only 1 SwapBuffers.

The GL trace I showed is for a very small object. The 64/45/22 fps object is a very large one so your explanation does make sense.

The VBO switch is the killer thing here.

One very important question remains unanswered though: in my original test, I only used display lists. I was comparing the framerate of the raw object and the same object with simple vertex+fragment shaders attached. And my framerate was dropping from 64 fps to 30 fps.

This, remains a mystery.

mhagain · April 5, 2011, 10:28am

What about taking your #version down to 110, or did you try that already with your GL 2.1 test? I’d be interested in knowing if that has any bearing on performance. Where I’m coming from is trying to rule stuff out and identify the point at which the perf drop-off happens. The renderer is certainly not efficient anyway, but all the same you should definitely not get that kind of thing happening.

Trying progressively more complex models may also be a good idea. Especially what I’m thinking of is the crossing-over point around the 64k vertexes mark.

fred_em · April 6, 2011, 5:53am

Omitting the version number in the shader or specifying #version 110 has no effect on the framerate.

I have found the explanation of the huge drop in the framerate. I mean, not the once caused by the VBO switches, the one that had yet to be explained, where I had passthrough shaders and display lists (30 fps) VS. just plain display lists (64 fps).

The framework I am using is OpenSceneGraph. To use shaders/programs I can either assign them to individual graph leaves or to the top node of the graph. When I assign a unique (application-wide) shader to individual graph leaves, the OpenGL trace shows that OSG is clever enough to avoid unnecessary shader switches, resulting in the rendering just being:

glUseProgram(id)
glCallList(id1)
glCallList(id2)
glCallList(id3)
glCallList(id4)
…
glUseProgram(0)

However, the work done on the CPU probably changes significantly, probably because OSG still has to ‘diff’ the state changes across graph leaves.
Specifying the shader on the topmost node shows the exact same GL trace, but no framerate drop. Probably because there is no work on the CPU side.

So my guess is that the CPU was making everything stall.

I was misleaded by the fact the GL trace was ‘clean’ when working on individual graph leaves, assuming the CPU work was almost zero. It wasn’t.

Cheers,
Fred

Inventor3D · April 11, 2011, 6:18am

I realize there has already been some discussion the performance of a vertex shader vs. fixed functionality. I would like to find out if (or confirm that) vertex shaders are ALWAYS slower. I ran a 30,000 mesh at 240 fps with fixed functionality. Then I attached just a vertex shader, using just this code:

void main()
{
gl_Position = ftransform();
}

The result is that fps is cut in half.

If vertex shaders are always slower, then I’ll have to work with the fragment shader alone. This is unfortunate because I can’t pass varying variables to the fragment shader without using a vertex shader (right?) So, to do lighting, I’m passing the normals through the secondaryColor (which is annoying)…

Any advice from an expert would bring me much delight.

mhagain · April 11, 2011, 6:23am

In the normal case vertex shaders should actually be faster; especially a simple passthrough vertex shader.

Why? Because on modern hardware the fixed pipeline is emulated through shaders, that’s why. So given that you’re always running a vertex shader anyway, simplifying that shader to either one that only meets your specific needs, or to simple passthrough, should be faster.

The only exception to this rule is if your implementation is emulating vertex shaders in software.

I know people say that FPS counts shouldn’t be used, but at the same time - the performance you’re getting seems quite low for the mesh you’re rendering. I can easily hit similar speeds for similar counts on a crappy office PC with a crappy integrated Intel. So definitely something other than just use of vertex shaders is wrong here.

Inventor3D · April 11, 2011, 6:28am

Interesting. Does this mean that my implementation must be emulating the vertex shader in software? I’m a newbie.

mhagain · April 11, 2011, 6:37am

Check my edit - the next obvious question would be: “what hardware have you?”

Inventor3D · April 11, 2011, 7:11am

Thanks. I have a I3. Windows properties reports: WDC WD3200BEVT ATA device. I got the comp last summer… Are you suggesting that on a different hardware, the fixed functionality wouldn’t be faster than this?:

void main()
{
gl_Position = ftransform();
}

The “mesh” could be going slower because it’s a terrain in which no triangles are hidden from view; also it includes multi-texturing. But even without texturing, the results are essentially the same.

mhagain · April 11, 2011, 7:18am

WDC WD3200BEVT ATA

Hmmmm, that’s a hard disk; don’t think they can do 3D graphics too well.

What about your display adapter?

Inventor3D · April 11, 2011, 7:34am

I just tested it on another laptop: this one was bought just a few months ago, but it’s also a i3. Same results. Somehow I don’t think that my implementation is emulating vertex shaders in software.

Inventor3D · April 11, 2011, 7:35am

Under display adapter, Device Manager says “Intel® HD Graphics”

BionicBytes · April 11, 2011, 7:59am

Device Manager says “Intel(R) HD Graphics”

OMG: when ever you see Intel and odd/strange GL behaviour we all cringe!
Intel make absolutely terrible OpenGL drivers. I bet with 99% certainty that’s your problem right there. You really, really need to test on an nVidia card or AMD to get a proper idea how you app is performing.

Inventor3D · April 11, 2011, 8:26am

That’s very helpful. I don’t know if I may ask this here, but do you know if I can add an nVidia card to my laptop? Or must I buy a whole new laptop?

mhagain · April 11, 2011, 8:31am

It depends on the laptop, but I really doubt if it’s possible; it’s a long time since I checked out this particular market but laptops with replaceable graphics were never very common. Your options are more or less: (1) buy a new laptop, (2) accept that you’re going to get sucky performance with OpenGL on Intel graphics, or (3) switch to D3D (which still sucks on Intel, just not as much).

BionicBytes · April 11, 2011, 9:24am

There is no way you can replace your GFX card or add a new GFX to a laptop.
The new range of DX11 class h/w had been released recently so now is a good time to buy a new laptop with a decent onboard video card (AMD or nVidia). I have laptops at home with GeForce 8 and others with AMD 48xx series - both are excellent.

In the end you get what you pay for. Buy a DX11 / OpenGL 4.1 laptop so you can develop something decent but don’t spend mega bucks as these things do go out of date within a few years!

Inventor3D · April 11, 2011, 10:32am

Thanks for the advice and helpful info. I’ll try to get a hold of an nVidia card for my desktop and test it from there.

Inventor3D · April 12, 2011, 3:12pm

I tested it with an nVidia card… and sure enough the vertex shader goes just as fast as the fixed pipeline. So, that’s cool.