Multiple depth buffer for layer effect

Hello everybody. I did read the guide for creating a new topic, I searched on Google to find an answer as well as on this site and I cannot find a clear answer.

Please note that

  1. English is not my native language.
  2. I am writing on a smartphone.
  3. It’s 02:43 am when I write this.

I am not looking for a code. In fact, even if I have a long (somewhat) experience with OpenGL, I never wrote a single line of Vulkan. My question is from a theoric point of view, almost an open discussion. I asked roughly the same question years ago about OpenGL, and back then the short answer was no.

The question is: is it possible to use two different depth buffers at the same time, in the same scene, during the same pass etc. ?

More precisely, let’s say I have a scene like this: a starfield, with two moons, some clouds and a house in a crop field, with the hand of the player holding a gun and a health bar floating on the top left corner.

In that scene:
The starfield will always be behind everything else;
The moon with the greatest orbite will always behind everything else, except the starfield;
The second moon will always be behind everything else, except the first moon and the starfield

You got the idea: the clouds can hide the moons and the starfield, the house can hide the clouds.
The hands can hide the house, because we don’t want them to clip through the house’s walls. They must always be drawn on top of the rest of the scene.
Finally the HUD is on top of everything else

You can obtain this effect by drawing each layer one by one and clearing the depth buffer between two passes. But I would like to keep the number of rendering passes to one.

The additional depth buffer would only contains integers (not float) and would not be interpolated. The vertex shader would define the value for all the fragments, that would be copied as is.
Such an implementation would be easy (it seems much simpler than the current behavior of the depth buffer) so I think that the problem is not there. Like my teacher used to say “it’s not exactly an NP hard problem”.
The depth test would be done by testing first the value of the integers then, if those values are the same, falling back on the classic depth buffer test.

Like I said, my question is: is it possible to do so, that in Vulkan ?
I kind of heard that Vulkan is more low level than OpenGL, so maybe we have to reimplements the depth test ourselves and, if it is the case, maybe we can tweak it while doing so.

Such a double test could eliminate primitive even before the fragmentation part: if some triangles of the farthest moon are hidden by the closest moon, or if the player is wearing a helmet ( that occludes a lot of it’s vision ).

Quite frankly, I find it weird that over the course of years almost every aspect of the pipeline opened up for modification but the depth test remained unchanged. I understand that criticizing the choices of khronos isn’t going to help me, but it’s more a genuine question than a critic.

Well in any case thanks for reading and good night.

Why would you need to clear the depth buffer? Why would you even need two passes? There are so many ways to do this without needing any of that. They all amount to rendering these things in order, from farthest to closest, after all of your depth work is done.

  1. Render those objects in the proper order, but with depth testing off.

  2. Render those objects with a shader that forces gl_FragDepth to be the maximum value.

  3. Render those objects with a projection matrix that forces the depth to be the maximum value.

As to the specific question, no, there’s no programmable depth buffer. Well, I mean, you can conditionally discard fragments based on reading values from an image. And with input attachments and pipeline barriers, you can even perform a series of read/test/conditional-write operations, emulating the depth buffer.

But really, you should just do one of the things I suggested. They’re all way faster and way easier to implement.

Rasterization hasn’t been “opened up for modification” either. That’s because these are especially performance-critical steps in the rendering process, still best handled by hardware.

Draw that stuff in that order with no depth buffer. Problem solved. It is called Painter’s Algorithm. By default subsequent draws hide previous draws.

Alternatively if they are a great occluders, then draw them in reverse order with depth buffer on (without no clearing or whatever). It would allow to ellide some fragment shader invocations.

The additional depth buffer would only contains integers (not float) and would not be interpolated.

Almost sounds like stencil buffer.

What Alfonse and krOoze said.

The case you may not be considering is translucency (alpha between 0…1). You may see the effects of obj A, obj B, and obj C in the resulting color of a single pixel (or sample, if MSAA is enabled). Example: Moon glow around moon #1 showing through to the stars/planets underneath.

Depth test doesn’t solve that. Painter’s algorithm does. Or proper multi-layer compositing using premultiplied alpha, but you don’t need that hastle here.

But removing the depth test will bring back the problem removed by the depth test.

Maybe I should clarify the context in which I need that.
I am trying to implement a somewhat optimized “game engine” ( which is a way too honorific title but anyway ). From my pasts experiences, I noticed that what was the bottleneck in my application was the number of rendering passes.

It is a logical consequence of the opengl tutorial you can find everywhere: they teach you how to render one object and, as a computer scientist, our first reflex is to package that into a function and call it into a for loop. And technically it works. But this approach cannot grow safely: ad you add more items to your scene, the frame rate will drop, linearly with the number of object and this no matter how many primitives they have.

So I decided to do it with one rendering pass, but immediately I had to solve problems that were automatically solved by the previous approach.
No longer I was capable of defining one transformation matrix per object.
No longer I was capable of defining one texture per object.
No longer I was capable of defining one shader per object.
No longer I was capable of removing or adding objects from the scene quickly, or replacing an object with a lower LOD variation of it.
And no longer I was capable of defining multiple layers.

For every problem I managed to find solutions, with variations even. Except for the last one. Transformations were stored in a texture (an array in disguise), which opened the door to skinning ( by using multiple consécutives indexes for bones ).
Textures were stored in a dynamic atlas, with the layout of the atlas being in another array-in-disguise texture.
( I had but didn’t implement a solution for shader materials, that I won’t talk about here as I didn’t test it)
Adding/removing/replacing objects were implemented in such a way the cost of the operation were the number of triangles of the object, but independent of the number of triangles of the whole scene.

For everything I had, implemented or in stock, solution(s). Except for this last part.

I am well aware of this painter’s algorithm, and of the problems of translucency. I also know that you can now have linked list thingy for an arbitrary quantity of information per pixel and, therefore, do not need to sort your triangles in a specific order to obtain it.

I do not plan on creating a game engine for advanced graphics (sorry, I know that you work hard to provide us that and I don’t disregard your work) but to have something that would finally allow an independent developer to just drop it’s resources into the engine without having to worry too much about performances.
I am/was one step away from that.

About batching ( not mentioned here but will come sooner or later ): it solve problems but only to a limited extent. It supposes that objects have the same shader and shader material, and sometime also presupposes about the vertex shader calculation ( basically it presupposes an ftransform() which may or may not be the actual calculation).
It makes impossible to move, add or remove objects.

I know that there is plenty of open sources project of game engine out there, and all of them advertise amazing graphics and capabilities.
I would like to go the opposite direction: limited capabilities BUT high performances, or at least acceptable for what is on screen. If I have a bookshelf with twenty books inside each with it own cover, each with only a bunch of triangles to define it, I should not have to give on either the physic ( by batching ) or the performances.

My question remains, about the possibility of doing what I said. I didn’t exactly asked if it’s a good idea ( even if I appreciate people trying to help me ). Vulkan is lower level, but to which extent ?

Thanks for the context. That helps. And what you tried to address your other perf concerns makes sense by the way.

Then that is the crux of your question. So the follow-up question to that is: what do you perceive to be lacking with the usual approaches for rendering multiple depth layers in a single scene which isn’t addressed by depth-tested opaque rendering or back-to-front translucency blending.

Related to that, what need is it you perceive here with single pass rendering that mandates a clearing of the depth buffer in the middle of scene rendering (or multiple depth buffers)? As opposed to just not using the depth buffer for some subset of your scene rendering?

(I can envision a few but I don’t want to load your question.)

Also related to the depth clear…

Before you get the complete wrong idea… Buffer clears are generally extremely cheap, especially on mobile GPUs but even on desktop GPUs, and can speed up your rendering. The driver engineers aren’t typically doing the naive thing a CS 100 student would (literally clearing all the pixels in the depth, stencil, and/or color buffers). They’re using this as an opportunity to “not” clear them but instead store a tiny representation someplace that they have been cleared. This can save not only on avoiding the pixel write bandwidth for the clear itself, but also the needless pixel “read” bandwidth when rendering the subsequent frame of content where driver already knows that those pixels have been cleared, without going to check the actual pixel storage. Moreover, for depth, it uses this opportunity to reset some driver-internal depth test perf optimizations like hierarchical/early Z, which may have been disabled by some of your possibly less-than-optimal rendering command sequences from the last frame. This clear optimization yields big mem B/W savings on mobile GPUs with some significant perf++ on desktop too in some situations (e.g. completely avoiding frag shader executions for tris or parts of tris that cannot be seen).

In other words, don’t think of a mere glClear( GL_DEPTH_BUFFER_BIT ) as this super heavy-weight render pass thing. It can actually speed up your GPU rendering if placed well.

For a similar mem B/W savings at the tail end of frame rendering, check out glInvalidateFramebuffer(). However, I only know for sure that mobile GPU drivers implement this properly. Not too sure about desktop (you can call it, but it’s unclear if it’s just a no-op on some desktop drivers).

It’s difficult to understand what you mean by “one rendering pass.” Vulkan has a concept called a “render pass”, but what you’re talking about doesn’t seem to relate to that. A Vulkan render pass has nothing to do with the number of transformation matrices, textures, shaders, changing objects, or whatever.

Most engines can do that. Unity, Unreal, etc, they try to be as drop-and-go as they can be while offering high performance.

I answered that question:

Well, do you want performance or not? Doing this the way you’re trying to do it will certainly not achieve performance.

Seconded. You can do all of those things in a single render pass in Vulkan, so it is not clear.

Additionally for context you should tell how were you defining “multiple layers” beforehand and what prevents it to be done in a single render pass.

Your question screams XY problem, but direct answer is you could also employ stencil test, which basically gives 256 individual layers of sorts (8 bit stencil buffer).

what do you perceive to be lacking with the usual approaches for rendering multiple depth layers in a single scene which isn’t addressed by depth-tested opaque rendering or back-to-front translucency blending

Thank you for asking this question, because it actually offer me an opportunity to give a mathematical answer.
Or it should have, but I forgot the name. I asked for it this evening on another forum, so I should be able to give it to you tomorrow, but I can describe it ( I kind of already did it ).

It’s a structure in mathematics were you have several set (S1,S1,S3…), each of them being a Z-like, and you have a order relation on top of that, that states A<B if and only if
either A is in Sn and B is in Sm and n<m;
Or A and B are in the same S and then you use the standard “<” relation ( since the S is a Z like ).

It’s like every element in a given S are infinitely far from any other element of a different S. You can compare elements in a given S with subtlety, their comparison is meaningful and precise, but as soon as you compare two elements of different S they are so abysmally different that you don’t even need to look at what the element actually is: it’s set spoke volumes.
( You may think about it as leagues in chess, if they were a bit more hermetic: comparing champions to champions and rookies to rookies involve looking at the details, but when you compare a champion and a rookie you can just hammer the answer of who is better, without thinking too much.
Yeah, this analogy isn’t great at all)

This isn’t addressed by classic depth test because we want subtlety in a given S ( we want real depth test for the details of the house, even though they are definitely always farther that the HUD ).

The usual depth test gives us an “R” ( the real numbers ) to work in. But usually ( if not always ) a game scene is made of several independent R+R+R+…
It’s not R times R, it’s more R plus R.
An on a computer you would represent it with a vector N*R, with the first element being the index of the “S” I was talking about ( the league in the chess analogy ) and the second element being the value inside that S.

What I mean by “a single render pass” is: a single call to glDrawArray, in OpenGL. Which implies, among other things, the impossibility to call glClearBuffer between layers ( and everything I mentioned above ).

I’ll continue tomorrow, good night

Ah! That was not obvious (or common usage for the “render pass” terminology).

Ok. So for the sake of discussion…

  • You want all layers rendered in one draw call, without any intervening state changes.
  • And you want multiple depth layers, with depth test of the tris within each layer used to resolve their order.

As a strawman, consider this option…

  • For S1,S2,S3, partition the full Z range into 3 sub-ranges Z1, Z2, Z3.
  • Enable depth test+writes.
  • Rasterize S1 tris to relative depths within Z1 (positioned by vertex shader)
  • Rasterize S2 tris to relative depths within Z2 (" " ")
  • Rasterize S3 tris to relative depths within Z3 (" " ")
  • Rasterize all of these tris using a single glDrawArrays() call.

What doesn’t this handle that you seek to accomplish?

Also, from a pragmatic standpoint…

  1. Do you really have so many depth layers that you think you absolutely need to combine them all into one draw call? If not, just use separate draw calls, clear the depth buffer between each, and render them from back-to-front.
  2. Do you really have so few objects that view frustum culling of this geometry is not needed and will yield no real speed-up? Either way, that calls into question whether you really need to (or should) batch everything up into one, single draw call.
1 Like

OK, use glDepthRange.

Given X number of layers, divide the range from [0, 1] into X ranges. So if X is 4, you have 4 ranges: [0, 0.25), [0.25, 0.5), [0.5, 0.75) and [0.75, 1].

You then sort your scene into layers and draw all of the stuff in each layer. It doesn’t mathematically matter what order you have them in, but performance-wise, you’ll be better off rendering the layers nearest to farthest. When rendering each layer, apply the appropriate glDepthRange for that layer.

Note that each depth range, each layer, only uses a portion of the depth buffer, but this portion is mapped from the entire range that your perspective projection uses. So when you change the depth range, it’s good idea to change your projection matrix to only include stuff within that range.

While doing a lot in a single draw call is desireable, you should not take it to such an extreme that you can’t actually do the things you need to do.

The goal should not be to make a single draw call. The goal should be to make the number of draw calls not scale linearly with scene complexity. For example, one draw call per layer means you can put as much stuff within a layer as you want without increasing the number of draw calls.

Also, if you’re dealing with this, look into multidraw functionality and gl_DrawID.

Yes, certainly to reduce CPU time consumption.

One glMultiDraw* caution though. If your subdraws contain very few vertices, this can lead to inefficiency on the GPU-side (more in the thread below). You literally end up with near-zero CPU time consumption for the dispatch (great!) but more GPU time consumption than you’d expect given the triangle and vertex counts.

So profile your frames. If you’re very CPU limited and far from GPU limited, this isn’t an issue and isn’t likely to be. Moreover, if you go glMultiDraw* as a first step, you can massage it into using glDraw* later if you find that you need it.

I thought about this “remapping the distance” thing, but I see two problems to it.
First is certainly manageable, but I think it could create issues with the perspective calculation. I am not sure, though.

The second is much more problematic: as far as I know, the distance in the depth buffer is the distance to the plan of the camera, not the distance to the camera. So, for objects in front of the camera it will be ok, but as soon as objects goes on the sides, the distances are compressed ( flattened, I don’t know how to name that ) into smaller intervals. Subdividing those already small intervals into smaller ones will lead to huge z-fighting. Am I wrong on that ?

And I know about the GPU wasting time when it has too little work to do ( due to the SIMD architecture I bet ).

Geometry culling is nice, but I worked with an open source game engine ( jmonkeyengine ) that did culling based on fov and bounding boxes, and it didn’t help ( in a perfect world you’ll divide the number of draw calls by 6 - which is great, but it’s a band aid on a broken leg if you draw objects one by one, and it presupposes what the vertex shader does ( for the bounding box to make sense ) )

And my solution is not incompatible with that: in a nutshell the program accesses primitives through “tickets”. This allow the buffer manager to reorganize primitives as it wants, without the rest of the program knowing it.
When you remove a primitive, the last primitive of the buffer is moved to fill the hole. No matter where the primitive was in the buffer, it’s constant time.
And of course you don’t need to implement that with a 1-to-1 opengl commands, you can have segments, and nothing prevents you from resending unmodified data, so you can merge segments separated with a small gap.
And here comes heuristics, that we all love and hate.

And when nothing special is performed on the primitives by the rest of the program, the manager can reorganize them.
In that situation an object could be tagged as “not required for the view” which would lead to a progressive movements of all of it’s primitives to the end of the buffer, by swapping ( and then we draw less primitives, so we don’t draw them ).

That avoids Diracs :slight_smile:

2:58 am already, I have to go.

P.S. : the question has been answered, and I thank you for that. I have no objection repurposing this thread as we ( ok, it’s only me ) are doing now, but the moderation may not agree.

In any circumstances, thank you all of you for the time you took reading my gibberish and writing coherent and constructive answers .

Good night :smiley:

You just need to adjust the perspective zNear and zFar values to the distances appropriate to the layer.

That’s not how “distance to the plane of the camera” works. The planar distance is the same whether the object “goes on the sides” or not.

Do you honestly believe that every other engine just throws stuff at the GPU without regard to visibility? Why do you think all those engines have complex visibility schemes like BSPs, portals, etc? These days, some engines are even doing scene visibility tests on the GPU. Do you think they would be doing that if they didn’t have to?

Maybe this is true for your specific workload, but it certainly won’t be true generally.

No, I definitely agree: unreal, unity, the cry engine and many other professional engine would destroy me in any possible comparison. I can’t compete, I don’t even pretend to. They are game engine, in the real meaning of the term: editor+optimizations+core.

But I can compete with open sources engines, those that are merely the “for loop over nehe’ tutorial”, that you can find everywhere. I target low hanging fruits ( PS1 ), but I would like to be able to do something else than spaceship battles ( where you have only a handful of objects and a starfield applied to a cube as a skybox ).

If I appeared as an arrogant newbie that thinks he has an idea nobody had before him, I apologize. Currently, on that project, I am more a stray dog that bit in a bone, and don’t want to let it go. Not because it’s the best bone in the world but because it’s good enough and just so much better than anything else it had had for years.

But I know my place. Every time I start to forget it I start to read an article about light effects and optimizations in games. It’s a good kick, that lets that stray dog, that I am, run away, wimping.

That’s not how “distance to the plane of the camera” works. The planar distance is the same whether the object “goes on the sides” or not

Ok, I did not express myself clearly. If you do a fog based on the z-distance, it won’t be a circle around the camera. If you extend your arms in front of you VS if you extend your arms on the sides ( in cross ), the will be more in the fog in the first case than in the second, even though your arms remain at the same distance from the camera.

And it means that in the second example the z value of your arms will be smaller.
I can provide links about that fog stuff ( I spent the last 30 minutes trying to find a website explaining different kinds of fogs - in vain. I remember the scene was a field of dead trees )
It’s the reason why on some fog, when you turn the camera, object the reach the border of the screen will go out of the fog. It’s cheap because it’s based on the distance to the camera, which is already computed, but it’s inaccurate.

And it’s for a game engine, not for a game. So the number of layers should vary.

Those aren’t game engines; they’re tutorials. Performance isn’t a goal.

Godot is an open-source game engine.

OK, but you said that in connection with wondering about depth fighting. Depth fighting happens when there is insufficient precision in the “z values” to distinguish two objects. My point is that because “z values” are relative to the plane of the Z and not the radial distance, depth fighting will not change based on “objects goes on the sides”.

Nothing gets “compressed” that would cause depth fighting.

Do you even want to constrain users into “layers”? After all, they’re just a function of depth ranges. If you expose depth ranges directly, and allow a series of objects to be rendered into a particular range, then your engine has no need to know anything about “layers”. It’s up to the user to create and manage that concept.

Every decision when building middleware is in tension between providing the greatest capabilities for the user’s needs and providing the most performance you can squeeze out of the system being wrapped. The more options you give the user, the less options you have for providing optimizations. That’s just how it is.

If your engine provides “layers” as an engine concept, the more control over them you give the user, the less ability you will have to optimize them (and the greater the chance of a pathological user doing the wrong thing). Neither choice is right, but these are the things you have to think about when making an engine.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.