Occlusion Culling by the Cryengine way: understanding

Hi All.

Nice to start new thread in such warm forum.

I have problems with low FPS in my little project.
And it requires to some optimizations.
When I tried to investigate ‘Frustum Culling’ algorithm I met the more interesting one: https://www.gamedev.net/articles/programming/graphics/coverage-buffer-as-main-occlusion-culling-technique-r4103/

Main idea of method is:
1 - Get depth buffer from previous frame
2 - Make reprojection to current frame
3 - Softwarely rasterize BBoxes of objects to check, whether they can be seen from camera perspective or not - and, based on this test, make a decision to draw or not to draw.

I have some questions related this way.

1 - How can new objects be rendered if we always using depth buffer from previous frame?
2 - How can I relate rasterized C Buffer with all my mesh objects? I mean: How can I accept or reject current object for rendering using C Buffer?

Thank you for answers. :slight_smile:

1 - The key idea is that the view changes very few between two frames. There will definitely have artifacts. But they are ‘acceptable’.
2 - Have a look at OpenGL occlusion queries.

Silence, thank you!

Reprojection of the old buffer to the new would require expensive fragment level reprojection of depth and would produce anomalies at silhouettes that you would have to treat conservatively. Of course you can render new stuff. Understand that your occlusion would have to be conservative in any scheme. New stuff might be hidden or visible. The real challenge is what do you do when deleting old stuff? Figure its bound box and cut a hole in your occlusion buffer?

The real problem here is the work involved to perform these tests and many schemes might take more performance than it saves.

Simple conservative systems like minimally inclusive bounds, portals etc can be useful tools to implement culling.

Also with performance issues an advanced occlusion culling scheme might not be your best first port of call.

To answer your specific question how to accept or reject an object the answer is simple. Objects are drawn by default. Only if a conservative test proves it is not visible can you cull it. This typically involves bounds conservative testing against an occlusion buffer/structure. Conservative in this instance means erring on the side of visibility particularly in z both when generating the occlusion buffer/structure and when testing the bounds of the tested objects.

Thank you, Dorbie!

I already implemented frustum culling for scene and for all shadow maps (I mean CSM and shadow mapping for point lights and spot lights).
It speeds up rendering from 35 FPS to 120 FPS. Very good result I think.

And now I want to start of implementation of “Coverage Buffer”.

Will do it step by step.

I have several technical questions for 1st step: ‘get the depth buffer’.

  1. How can I get depth buffer from previous frame from GPU to CPU?
  2. Is there way to downscale resolution depth buffer on GPU before get it to CPU side?

[ol]
[li] glBindBuffer(GL_PIXEL_PACK_BUFFER)
[/li][li] glReadPixels() or glGetTexImage()
[/li][li] Wait
[/li][li] glMapBuffer() or glGetBufferSubData()
[/li][/ol]

Step 2 copies the data from the framebuffer (or the texture attached to the framebuffer) to GPU memory, step 4 copies the data from GPU memory to CPU memory. Decoupling these steps avoids stalling the CPU until the commands which generate the frame have completed.

Render a quad with the source depth buffer bound to a texture image unit and the destination depth buffer attached to the framebuffer.

[QUOTE=GClements;1289909]
Step 2 copies the data from the framebuffer (or the texture attached to the framebuffer) to GPU memory, step 4 copies the data from GPU memory to CPU memory. Decoupling these steps avoids stalling the CPU until the commands which generate the frame have completed.[/QUOTE]

What do you mean?
I can work with GPU in multithreading mode?

[QUOTE=nimelord;1289910]What do you mean?
I can work with GPU in multithreading mode?[/QUOTE]

opengl “runs” parallel to your cpu (that is on the gpu) and you can avoid syncronising cpu <–> gpu by doing a read-back into an unused buffer (GL_PIXEL_PACK_BUFFER) while your cpu does other things INSTEAD OF waiting for syncronisation + data download (from gpu to cpu).

https://www.khronos.org/opengl/wiki/Pixel_Buffer_Object#Downloads

Thank you guys!

This discussion make me ask another question:
Can I render several frames (for example shadow maps for all scene lights) in the same time?

Most OpenGL functions simply enqueue a command for execution by the GPU; they don’t wait until the command has completed. The drivers will take care of sequencing commands, i.e. ensuring that commands which modify data have completed before executing subsequent commands which use that data.

However, functions which read data back to CPU memory have to wait until any commands which have modified that data have completed. So reading data back to CPU memory should be avoided wherever possible. Where it is unavoidable, the commands which generate the data should occur as early as possible while the copying of data to CPU memory should occur as late as possible (i.e. you should try to interleave other commands between these stages). Copying data between various OpenGL objects (buffers, textures, framebuffers, etc) doesn’t inherently require copying data back to CPU memory (e.g. glReadPixels() + glTexImage() can be done entirely on the GPU).

Using multiple CPU threads for submitting rendering commands is likely to be a net loss, particularly on consumer hardware, and should be avoided. Using additional threads for uploading data (with rendering confined to a single thread) is less of an issue.

[QUOTE=GClements;1289909]
Render a quad with the source depth buffer bound to a texture image unit and the destination depth buffer attached to the framebuffer.[/QUOTE]

Should I bind VBO to texture with screen resolution and the same time I bind VBO with low (downscaled) resolution to framebuffer?
Can I render to two destinations at the same time?

Also I have another problem
if I use next code, then buffer is filled ok:


  ByteBuffer buffer = BufferUtils.createByteBuffer(display.getWidth() * display.getHeight() * 4);  // create buffer on CPU side
  glReadPixels(0, 0, display.getWidth(), display.getHeight(), GL_DEPTH_COMPONENT, GL_FLOAT, buffer);  // reading depth data directly from framebuffer to buffer on CPU side.
  // have here filled buffer.

but if I trying do that as GClements wrote, then buffer is empty:


  int devicePBO = glGenBuffers();  // init buffer on GPU
  glBindBuffer(GL_PIXEL_PACK_BUFFER, devicePBO);  // bing buffer
  glReadPixels(0, 0, display.getWidth(), display.getHeight(), GL_DEPTH_COMPONENT, GL_FLOAT, devicePBO);  // reading depth data from framebuffer to GPU buffer.
  ByteBuffer buffer = BufferUtils.createByteBuffer(display.getWidth() * display.getHeight() * 4);  // create buffer on CPU side
  glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE, buffer);  // map GPU buffer to CPU buffer
  // have here empty data in CPU buffer.

May be I’m completely wrong with my interpretation of:

[QUOTE=nimelord;1289931]Should I bind VBO to texture with screen resolution and the same time I bind VBO with low (downscaled) resolution to framebuffer?
Can I render to two destinations at the same time?
[/QUOTE]
You’re simply rendering a quad (two triangles). The destination FBO will have the low-resolution depth texture as its depth attachment (and no colour attachment). The high-resolution depth texture will just be a texture. The fragment shader will just sample that texture and write the result to gl_FragDepth.

glMapBuffer takes two arguments, and returns a pointer to the mapped region. If you want a copy in CPU memory, you’d memcpy() (or similar) from the mapped region into the CPU-side buffer. But in that case you may as well use glGetBufferSubData().

It seems I understood what do you mean.
So there is an algorithm:

1 - Render a scene into high-resolution texture instead of framebuffer.
2 - Render result texture as a quad to the low-resolution texture with depth attachment only.
3 - Copy (or render?) high-resolution texture to framebuffer to show it. (I’m not sure here how to show on display prepared texture.)
4 - Use low-resolution depth texture for “Coverage buffering”.

It is correct?

Next step will be “Coverage buffering” itself. But here I need additional investigation for details.

You wrote:

I read before “Coverage buffering” requires software rasterization.
I thought depth texture must located on CPU memory for doing that, because I need to keep relations of rasterazed meshes with fact of visibility of them. (Or may be exist some way to do this process on GPU side?)