How does the structure of your renderer class look like?

I was wondering how the pros do it.

Here is my solution:

The main concept was to hide the VBO/non-VBO codepath from the other parts of the program. In the initialization process if VBO is supported I create two VBOs: one for storing static vertex data and one rendering buffer (I explain it later).

There are two main functions: RenderBuffer(buffer) and AllocateBuffer(type). With AllocateBuffer(type) you can allocate vertex buffers (wow! ). If VBO is not supported buffer is allocated in system memory all the time (simple vertex array). If VBO is supported and type is SYSTEM, the buffer is allocated in system memory (simple vertex buffer). If type is FAST the buffer is allocated in the previously allocated VBO.
The RenderBuffer(buffer) renders the specified buffer. There 3 possible code paths:

  1. If VBO is not supported the buffer is in system memory and it rendered as a simple vertex array.
  2. If VBO is supported and buffer is in system memory it first copied into the rendering VBO, and the VBO is rendered.
  3. If VBO is supported and the buffer in the previously allocated VBO it rendered directly from its current location.

This way other parts of the program don’t need to know if VBO is supported or not. What’s your solution and what is your opinion about mine?

if vbo is not supported, there are faster ways than ordinary vertex arrays, for example, display lists, and I once read (and my experiences confirm this) that even immediate rendering calls are faster than vertex arrays. it really depends what you are planning to do. If your program is not supposed to run on old hardware, you might simply skip the non-vbo part. in general, fitting OpenGL into a clean object oriented structure is not very easy because of its procedural state machine approch.

the “pro” way is to use a scene graph (a frequent topic in this forum), and game programmers like John Carmack are said to be writing evil chaos code .

I think the speed of standard vertex arrays comes mostly from the relatively few number of function calls needed to render a large number of vertices. The result is that you’ll more likely be limited by bandwidth than CPU speed.

I once changed my program from immediate OpenGL calls to rendering exactly the same thing with vertex arrays and it got slower. Also, I read in a paper from nvidia that vertex arrays are the slowest way to draw something, best is vbo, 2nd display lists, 3rd immediate calls, 4th vertex arrays, if I recall correctly.


What about Compiled Vertex Arrays? There hasn’t been too much talk about them since the Quake3 days, but that’s the rendering primitive that it uses.

Come to think of it, are we talking about regular vertex arrays or indexed vertex arrays?


Compiled vertex arrays are great on software transform cards, if you need to multi-pass your geometry. Note that some implementations of LockArrays() will ignore the call if the number of vertices is larger than some ludicrously small fixed buffer (like, 1024 vertices or so). This is know as “the Quake optimization” :slight_smile:

DrawElements() isn’t so hot if you have regular malloc() storage, but it’s better than a poke in the eye with a sharp stick, especially for high-poly models.

DrawRangeElements() is likely faster than immediate mode for all but the most degenerate geometry.

In my opinion, a good renderer abstraction will provide a mechanism that’s a little similar to DirectX vertex and index buffers:

  • allocate vertex buffer
  • lock it for update for X vertices of format Y, return a pointer, optionally discarding what’s there already
  • write data into the pointer provided
  • unlock it when you’re done
  • submit geometry to your renderer using a triplet of:
  • modelview transform(s)
  • shading parameters (“material” including textures)
  • geometry and index buffers

You can also supply usage hints (static, dynamic, streaming, etc) at the point of creation, or at the point of locking.

We do this in our engine, and have three back-ends that implement the vertex buffer factory: plain, NV_VAR and ARB_VBO. This works very well. (The NV_VAR implementation does a bit of pipelining and fencing to allow for keeping data in the same spot of VAR as long as possible, but revert to a streaming mechanism once the application working set exceeds the size of the VAR.)

Sorry, I forgot to write that I use indexed arrays…

Although I didn’t test it I can’t believe that immediate mode is faster that vertex arrays. At least not with indexed arrays. If you have more polygons that share the same vertex it and you use indexed arrays it must be faster that immediate mode (at least in most cases).

Isn’t the expectation with VBO’s that everybody implements them and that they replace all other methods of vertex transfer? Even TNT2’s implement VBO’s, so they’re pretty widely accepted only 6 months or so after the spec’s release. They’re going into the 1.5 core. So, precisely what is the point of writing non-VBO code anymore, except to support truly legacy hardware, like Voodoo’s and so forth. Even then, I would rather just implement the VBO spec itself using avaliable OpenGL calls than have to write a full back-end layer.

Hmmm. That’s a good point…

Originally posted by JanHH:
…I read in a paper from nvidia that vertex arrays are the slowest way to draw something, best is vbo, 2nd display lists, 3rd immediate calls, 4th vertex arrays, if I recall correctly.

I personally never experienced slower performance with VAs than immediate mode so, while I agree on 1st and 2nd place, I disagree on the last two.
On a nvidia paper I got it was explained that not all vertex array formats are being supported by hardware. Most commont will but not all. Maybe this should be accounted.

I guess you know that for immediate mode CPU speed is the limiting factor. If CPU pipelines continues to grow in length, the cost will increase for obvious reasons.
If you’re CPU limited, immediate mode will be an overkill. Some years ago I was testing an app for compatibility on a P133. Using VA the framerate almost doubled.
The same app on a P3-800 gained only marginally by the switch to VA.

What the paper is probably assuming, is that you’re not CPU limited.

Originally posted by jwatte:

  • submit geometry to your renderer using a triplet of:
  • modelview transform(s)
  • shading parameters (“material” including textures)
  • geometry and index buffers

Just curious - how do you pass shader parameters to the renderer? Push/pull method? I mean, when shader needs ‘brightest 3 directional lights’ how he gets those?

I completely agree with the post of jwatte
I implemented this type of renderer. I masked the VBO feature from the user and some others stuff. My implementation is maybe not perfect but with the time it’s very practical to use. If you are interested to see how. contact me

I made it even more abstract, by giving the render-queue “render-objects” only. This involves anything related to the final product, not only physical geometry. Then, at a later state, the renderer gets all render objects and runs a set code-block for a a special render feature, such as geometry, shadow maps, reflections etc.

By just registering a new code block, one can easily insert new functionality into the engine without any rewrites, only additions. Good for a SO or DLL based structure, in case I’d like to update the graphics of a game when it has been released and new cards come out

This means, the renderer is split into several bits (or classes, I don’t generally “over-use” OOP ): The “engine”, the potential cache (storing old information), the render-list and at last the registered render features.
There’s a lot more going on, but generally I wanted to get something that would allow very simple modifications for extensions, like a chunk mgr that controls and adapts buffers to the current scene, shaders, plugging in a terrain etc.

In short, I hide the API-dependent code from the engine and plug in code to make things happen (a spider in the web).

when shader needs ‘brightest 3 directional lights’ how he gets those?

That needs to be provided as shader parameters, calculated on the CPU. This is part of the material set-up for the triplet. At least conceptually (the details, of course, matter in everything :slight_smile: