I’m running into a strange performance problem I do not find any more angle to attack. Maybe somebody has an ideas what could cause the problem here.
I’ve got a simple sequence of deferred rendering shaders:
- Shaders A: render depth buffer
- Shaders B: render material buffers
- Shaders C: render SSAO (consumes A, B and writes to material buffer)
- Shaders D: some lighting stuff
- Shaders E: render sky shadow (consumes A)
- Shader F: render sky light (consumes A, B, C and E)
So far nothing special. I did some optimization of the “SSAO” (C) and cut the time down by ~1.2ms . Measuring the performance with RenderDoc I do get over the entire frame render time an improvement of ~1.2ms. So this is what I hoped for.
I then tested this against VR where render sizes are a lot higher compared to regular PC monitor. In this situation my SSAO optimization shrug off ~7.2ms which is quite substantial. But now comes what I didn’t expect. The entire render time as measured in RenderDoc is pretty much equal, maybe marginally faster.
Examining the measurement in details I noticed that shader F before the optimization clocked in at 260us. After the optimization the same shader F suddenly clocked in at 2.6ms(!) and I did not even change it at all. Some other shaders even later than this also suddenly exploded in time eating up all the improvement I made.
After some testing I noticed that if I artificially make shaders C expensive the duration of shader F goes back to the original value. It thus looks as if shortening shaders C causes shader F to become more lengthy. How can this be?
Shader F consumes the depth buffer from A and the materials buffers from B and C.
I know the GPU can only start rendering if all the input textures have been finished rendering to. But no matter if shaders C are lengthy or short the latest texture consumed by shader F had been written by shaders C. Also shaders C consumed depth from A and material buffers from B. If this would be the problem shaders C must be delayed too but this is not the case.
Do you have any idea what kind of GPU design might cause here problems? How can I further debug this problem? RenderDoc can not help here anymore and AMD provides no OpenGL performance tool anymore.