Optimal post-process design - Compute shader + imageStore or full-screen rectangle?

I’m currently working on blurring for variance shadow maps, but it’s really the same thing as a post-processing effect. I thought I would be able to get the best performance using a compute shader and imageStore() to write data into the target texture. However, I am seeing really significant performance drops that seem disproportionate to what I’m doing, just blurring a single 512x512 cube map. I’ve adjusted the dispatched worker count to find the optimal number, and it’s still very slow. (This is using one horizontal pass followed by one vertical pass.)

Is it a known norm that compute shaders with imageStore() on PC hardware are going to be slower than writing to a render target with a fragment shader? It would be nice if I could save some time and not have to write both implementations just to find out which is faster.

1 Like