glFramebufferTextureLayer +glDrawElements = massive CPU slowdown?

I’m using a depth texture array with 4 depth layers for cascaded shadow mapping, with each cascade stored in a separate layer:


// Pseudo Code - Initialization
frameBuffer = new OpenGL FrameBuffer
depthTexture = new OpenGL Texture
glBindTexture(GL_TEXTURE_2D_ARRAY,depthTexture)
glTexImage3D(
	GL_TEXTURE_2D_ARRAY,0,GL_DEPTH_COMPONENT24,
	texWidth,texHeight,cascadeCount,1,GL_DEPTH_COMPONENT,GL_FLOAT,NULL
)

I then render all shadow casters into each of the cascades, by binding the respective layer to a frame buffer via glFramebufferTextureLayer, then using glDrawElements to render the meshes:


// Pseudo Code - Rendering
frameBuffer.bind()
depthTexture.bind()
foreach shadowCaster {
    shadowCaster.bindVertexBuffer()
    shadowCaster.bindIndexBuffer()
    for i = 0,cascadeCount {
        bindCascadeMatrix()
        // FPS drop
        glFramebufferTextureLayer(GL_FRAMEBUFFER,GL_DEPTH_ATTACHMENT,depthTexture,0,i)
        glDrawElements(GL_TRIANGLES,shadowCaster.triangleCount(),GL_UNSIGNED_INT,(void*)0)
        //
    }
}

The combination of these calls causes my FPS to go from 190 to 80. If I disable either call, it goes back to 190.

My assumption is that because each glFramebufferTextureLayer-call accesses the same texture, it blocks and waits for the previous rendering process to finish. Is that correct? If so, what can I do to avoid it?

I’ve tried generating a separate framebuffer for each layer and only calling glFramebufferTextureLayer once for each framebuffer during initialization, then binding the framebuffer during the rendering process, but the result is the same slowdown.

The combination of these calls causes my FPS to go from 190 to 80. If I disable either call, it goes back to 190.

First, please stop measuring your frametime in “framerate”. A far more useful measurement of frame time is actual time. That is, the time it takes to render a frame, generally in milliseconds. For example, 190FPS translates to 5.26ms per frame, while 80FPS translates to 12.5ms.

Second, if you “disable” your glDrawElements call, then you aren’t drawing anything. And if you don’t have a depth attachment in your framebuffer, then your FBO is probably empty. So you’re also not drawing anything.

If you don’t render anything, your framerate goes up. Also, water is wet :wink:

However, if you want to know why you’re getting a 2.25x frame time increase, read on:

My assumption is that because each glFramebufferTextureLayer-call accesses the same texture, it blocks and waits for the previous rendering process to finish. Is that correct?

Quite frankly, you have a far bigger problem. It’s right here:

foreach shadowCaster

Your render call is changing the FBO state 4 times for each mesh that generates a shadow.

FBO state changes are easily the most painful of any state change, performance-wise. You should only do them when absolutely necessary, and only the absolute minimum number of times you need to do your work. Since you’re rendering to 4 different depth maps, you should change your FBO state exactly 4 times.

So you set the FBO to use one cascade map, then render all of your meshes, set the FBO to the next, render all of the meshes, and so on.

Also, it’s not a good idea to bind the depth texture that you’re rendering to while you’re rendering to it.

Thanks, that helps a lot!

[QUOTE=Alfonse Reinheart;1278997]
Your render call is changing the FBO state 4 times for each mesh that generates a shadow.

FBO state changes are easily the most painful of any state change, performance-wise. You should only do them when absolutely necessary, and only the absolute minimum number of times you need to do your work. Since you’re rendering to 4 different depth maps, you should change your FBO state exactly 4 times.[/QUOTE]
Does glBindFramebuffer count as state change? How do I know which OpenGL-calls are expensive to use, and which are not, without doing tests for every single one?

I’ll give that a try, thanks.

[QUOTE=Alfonse Reinheart;1278997]
Also, it’s not a good idea to bind the depth texture that you’re rendering to while you’re rendering to it.[/QUOTE]
So, should I use one framebuffer per layer instead? Then bind those instead of using glFramebufferTextureLayer?

See thisslide from nvidia. It should give you a rough idea. It’s from the Beyond Porting presentation.

Does glBindFramebuffer count as state change? How do I know which OpenGL-calls are expensive to use, and which are not, without doing tests for every single one?

That’s the nature of performance testing: you generally can’t just look at a function and know that it’ll be slow. You have to know something about hardware, how things are implemented, and so forth. Or you do performance testing.

Alternatively, you can search the Internet and come up with the results of someone else’s research, as IonutCava demonstrates :wink:

But either way, the API alone won’t just tell you.

Oh, and FYI about that slide: ROP represents much of the per-sample processing state like blending, logic ops, and similar constructs.

So, should I use one framebuffer per layer instead? Then bind those instead of using glFramebufferTextureLayer?

Either way would work more or less the same, probably with identical performance.