Simple blending shader using image load/store produces false results

Hey everyone.

While reasoning about the problem in this thread I started fiddling around with image load/store. Now, as a first exercise I went for simple compositing of two images inside a shader storing the result in a third image. Aside form not having to use a third image at all using image load/store, I came across the following behavior where the first image is the correct result after the first frame and the second is what I get when rendering the second, third, to the n-th frame:


It’s standard stuff - rendering a fullscreen quad with a pass-through vertex shader (plus tex coord) and the following fragment shader:

#version 420

layout(binding = 0, rgba8) uniform image2D ImageA;
layout(binding = 1, rgba8) uniform image2D ImageB;
layout(binding = 2, rgba8) uniform image2D ImageResult;
layout(binding = 2) uniform sampler2D ImageResultTex;
layout(location = 0) out vec4 FragColor;

in vec2 InterpTexCoord;
const int Size  = 512;
const float Alpha = 0.5;

void main()
    ivec2 ImageTexCoord = ivec2(InterpTexCoord * Size);    

    // Image A and B are declared read-only
    vec4  ColorA        = imageLoad(ImageA, ImageTexCoord);    
    vec4  ColorB        = imageLoad(ImageB, ImageTexCoord);
    vec4  ColorResult   = ColorA * (1.0 - Alpha) + ColorB * Alpha;        

    // ImageResult is declared read-write
    imageStore(ImageResult, ImageTexCoord, ColorResult);

    // false results after the first frame        
    FragColor = imageLoad(ImageResult, ImageTexCoord);
    // regular texture lookup is always correct
    //FragColor = texture(ImageResultTex, InterpTexCoord);

The image unit setup is done as follows:

glBindImageTexture(0, tbo_a_, 0, GL_FALSE, 0, GL_READ_ONLY, GL_RGBA8);    
glBindImageTexture(1, tbo_b_, 0, GL_FALSE, 0, GL_READ_ONLY, GL_RGBA8);        
glBindImageTexture(2, tbo_result_, 0, GL_FALSE, 0, GL_READ_WRITE, GL_RGBA8);        

The texture objects are setup correctly as well.

As can be seen in the above code, only image loads return the wrong value after the first frame. Doing a normal lookup with the corresponding sampler always succeeds.

I assumed the above should succeed because as I read the specs (GL and GLSL), memory transactions in a single invocation of the fragment shader are well-defined and need not be synchronized using a coherent qualifier or a memoryBarrier(). Correct?

I get no errors at any time. The GPU is Radeon HD 6350 with Catalyst 12.5 reporting 8 image units.

Does anyone spot what’s going wrong?

Have you tried putting a memoryBarrier() call between the imageStore() and imageLoad() to ImageResult? I’m pretty sure the imageLoad() won’t wait for the store to complete otherwise.

Yes, I did - with no effect. As I stated, I don’t think the barrier is necessary there because I’m altering one texel per invocation and so no other invocation is dependent on the result. If I’m not mistaken memoryBarrier() is only needed if you have other invocations needing to see the results of memory transactions of the current invocation.

With all the layers of memory and cache on a GPU, I would expect that you’d need some sort of read/write synchronization, even on the same texel. But if the barrier doesn’t work and coherent doesn’t work, perhaps it’s a driver bug.

You didn’t declare your image values as coherent. You also didn’t put a proper memory barrier between the write and the read. You must do both in order for a write followed by a read to work.

Hmm, just checked on my home machine with the exact same code, but an HD 6780 with Catalyst 12.6 installed and it works. I’m confused.

That’s because it’s undefined behavior. It may work and it may not.

If you make your variables coherent and use a proper memory barrier, then it will work everywhere.

But how does that correllate with what the spec is saying?


This perplexes me. Say the fragment shader is executed exactly once per texel and invocations operate exactly on that texel. Why would I need to declare the variables corherent in this case? I swap buffers immediately after rendering the quad and do no transactions in the meantime. I inserted a memoryBarrier() after the store and it had no effect but, to be fair, I didn’t declare the variables as coherent.

Again, if no interaction between invocations is given, why is it a problem for a single invocation if “the order of reads and writes within a single shader invocation is well-defined”? Although they are subsequent operations in a single invocation they may still be executed out of order? I have no trouble grasping that you need to synchronize when an invocation depends on data written by another but during the same single invocation? How is it undefined if it’s well-defined?

Maybe it’s just me. :frowning:

Having just tested the same thing on the Radeon 6350 with Catalyst 12.6 I can state that it’s a bug in Catalyst 12.5

Anyway, I’d still like to hear from you guys on the above matter.

In my opinion you don’t need neither memoryBarrier() call nor ‘coherent’ qualifier and your shader code is correct. This is because each shader invocation accesses unique memory location and no interaction between invocations occur.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.