Given recent changes in GPUs and anticipated changes in OpenGL, could you guys please critique my idea for state management? My main goal is performance; expensive states should change infrequently and states should not be thrashed. Another goal is ease of use; the developer should never have to push/pop or query/restore state.
I want to divide state into two categories; publishable state and everything else. An object that renders (e.g. has a virtual Render method) also publishes a state block that should be set before the object is rendered. The scene manger renders in two passes. The first pass determines visible objects and drops them into buckets based on their state blocks. The second pass iterates over the states, sets the state and then renders the associated objects.
Of course, there are a lot of details:
Tom Forsyth’s Blog suggests that functional (e.g. enable/disable) states are more expensive than value states (e.g. color). So I’d like to put functional states in the state block since the objects will be sorted by these. Then the object can set other states themselves if they need to. States not in the state block will not be defined when the object is rendered. For example, if the state block says the stencil test is enabled, the object will need to set the reference value and compare mask. If the state block has the stencil test enabled, the reference value and compare mask don’t need to be set.
The states in the state block could be represented in a bitmask; cheap states to switch use bits closer to the LSB and expensive states (e.g. shader) use the bits closer to the MSB. The bitmask is treated as a number and sorted ascending for rendering to minimize expensive state changes.
The scene manger
Full stop. OpenGL should not have a “scene manager”, or anything even remotely like one.
Korval, I don’t think Patrick is trying to make a new version of OpenGL! Just a scene graph built on top of OpenGL.
Patrick, I use OpenSceneGraph, which has the ability to sort the scene graph to render in order of state. However, in practice the cost of sorting outweighs the benefit of faster rendering so the default mode is to not sort.
Correct. Thanks for clearing that up.
I’m not suggesting an nlogn sort each frame. State-sorting would only cost touching each renderable object twice instead of once. In the first pass, the objects deemed visible are dropped into buckets based on their state block. For each visible object this would be O(1) (direct mapping, no expected O(1) hashing).
In the second pass, the state blocks would be traversed and related objects rendered as I described above. The only time an actual sort takes place is when a new state block needs to be created, which for my application domain would be rare.
So this means that OpenSceneGraph is poorly written (as most of the open engines…). State sorting does matter and gives performance boost. You definatelly should consider sorting by shaders (GLSL programs). I’ve read that sorting by textures is also important, but in my opinion it’s hard to implement. I have spent a lot of time designing OpenGL wrapper that would reduce state changes, and after years of coding jumped to conlusion, that direct sorting by the very OpenGL state is not a good idea. It’s better to design good, objective layer on top of OpenGL (knowing how DirectX 10 works helped me a lot), and than sort by that objects. For example you could split the state in objects (rasterizer state object, depth test state object, stencil test state, GLSL program, program environement (bound texures, texture filtering (i.e. shadow PCF))) and then consider the drawing state as a small vector of such state objects. You could also do some preprocessing (hidden of course) and assign some keys to such objects (i.e. detect identical depth state objects during their contruction and assign them some integer keys -> than sort by the keys, not state values) to speed up sorting. Of course immutable objects (as in DX10 and as GL3.0 was meant to be…) help a lot here. Tracking the value changes (i.e. shader constants) is probably not a good idea, as Tom Forsyth suggested.
Did you organize state the same as Direct3D 10? Or did you speculate based on the GL state blocks in the pipeline newsletter? One of my goals is to design a system that can be implemented efficiently with a future version of GL as well as the current one. Perhaps this is impossible. For example, if I put alpha test in the state block and then it is removed, I will have to emulate it with a shader.
Absolutely, when a renderable object is created it would request a state block given the states it wants to render with. That state block would be immutable and would be shared across many objects.
In terms of sorting by texture, an individual object might sort by texture (e.g. a model) but textures would not be sorted across objects.
The problem with sorting by texture is somehow related to texture units. Can I make an assumption that the texture unit the texture is bound to is irrelevant? I mean that two shaders are using the same texture, but one for tex unit 0, other for tex unit 1 - when I change the shader and rebind texture, are the harware texture sampler units trashed??? My best guess it’s vendor dependant. So I have to make sure the same texture is used on the same texture unit. It is possible, when you design the GLSL Program wrapper in such a way, that you don’t really use texture units, you just bind texture to a uniform, texture unit is chosen internally. Generally, seems it’s doable, but a it’s lot of work. But might be worth it, I haven’t really finished and tested that part…
As to the state objects. I base mostly on DirectX 10 design. It’s hard to tell what future OpenGL might look like, and I also want to be ready for DirectX port. I do not use alpha test here, I rather make legacy state objects wrapping old fixed pipeline functionality. So I can have i.e. lighting state object, texture environment object, texture gen object, etc. I will make it a lot easier to emulate them using shaders, when GL 3.1 will finally get rid of the fixed pipeline.
I definitely recommend you and others read this excellent article:
Multiple scenegraphs are the way to go.
Thanks. I’ve read that before and just reread parts of it. I completely agree that there isn’t a one-size-fits-all scenegraph. For example, terrain rendering is completely separate from the system I describe. This system is really for things like models, polygons, lines, and points. Different representations of the scene is fine - in particular, the GUI will want to present the scene to the user differently than I represent it for rendering. But at the end of the day, I need to efficiently render the scene. What else do you suggest besides the ideas discussed thus far; organinizing state into blocks, culling on pass 1, and rendering in state-sorted order on pass 2?
I think precomputed PVSs are still potentially a really big win for architectural walk-throughs or pretty much any environment with high, non-trivial depth complexity.
There’s predicated rendering of course, but with that you probably need to visit and test the entire (potentially) visible world each frame - which may or may not be a bad thing, depending on the size and shape of your world…
I agree precomputed PVSs are great, especially since they have so little CPU overhead during rendering. Unfortunately, I can’t precompute visibility in most cases because applications can continuously add and remove objects, and many of the objects will be moving.
My main concern is my suggestion to sort objects by the “functional” states they publish in a state block, and let the primitive itself set the “value” states. I feel it is good design for today, or perhaps yesterday’s hardware and APIs but I’m not sure about the future. Perhaps, as long as I stay light on the CPU, I will be fine.
I suggest you can also consider topological sorting. You can for example do state sorting and approximate depth sorting, to draw objects in approximate front-to-back order and take advantage of the early depth-cull. This could be even possible for transparent objects, just the priorities change (sort by depth, but when objects do not overlap on the screen the state sorting could be used).
As to PVS, they are nice, but static as a rock… And there is no easy way around this as far as I know, full HSR engine is required…