When do atomic operations on non-coherent memory make sense?

I’ve spent quite some time trying to figure out the proper usage of the coherent qualifier especially with regards to atomic operations (e.g. atomicAdd) in compute shaders. See https://stackoverflow.com/a/60434348/1030527 for the details.

From what I read in the wiki, it seems that coherent is required when using atomics to manipulate a memory location in a SSBO from multiple shader invocations (e.g. a “number of alive particles” counter when emitting particles in a compute shader). This was made even clearer in this recent OpenGL wiki edit.

However, my understanding is that, broadly, atomic operations are meant as a way to do an undivided operation (e.g. read + add + store) from the viewpoint of any observer (i.e. any shader invocation). If coherent is indeed required for atomic operations to actually work this way, why would one ever use an atomic operation without coherent? If there is no such valid case, why isn’t coherent implied for all atomic operations?

I can’t wrap my head around it!

Then you misunderstood that change. You (and the page you link to) are talking about atomic operations on SSBO variables. The wiki edit you cite is specifically about atomic counter variables, which cannot have the coherent qualifier. These are different things.

As stated in the Wiki article on Incoherent Memory Access, coherent is about the visibility of the operation to dependent shader invocations. If you’re doing atomic operations to the same variable in different stages, then you need the coherent qualifier so that the caches from the different stages won’t interfere with one another. If you’re only doing atomic operations from one stage, then you don’t need coherent.

Looking at it again, I see I misread the diff report. I thought the whole paragraph starting with “However, if memory has been modified in an incoherent fashion…” was added.

In any case, I re-re-re-reread that Wiki page and I don’t get to the conclusion that

I’m hoping I’m just dense :slight_smile:

Here’s how I read it:

There are a number of advanced operations that perform what we call “incoherent memory accesses”:

That’s me

However, if memory has been modified in an incoherent fashion, any subsequent reads from that memory are not automatically guaranteed to see these changes. These reads could be from any OpenGL operation that reads from the memory. This includes, but is not limited to:

  • SSBO reading operations.

I expect atomics are included in this

Furthermore, any subsequent writes to those memory locations are not guaranteed to overwrite the value written incoherently.

That’s my scenario.

First, within a single shader invocation, if you perform an incoherent memory write, the value written will always be visible for reading. But only through that particular variable and only within the shader invocation that issued the write. You need not do anything special to make this happen. However, it is possible that, between writing and reading, another invocation may have stomped on that value. So long as that is not the case, reading it will produce the value you have written.

With work group size > 1, there are multiple compute shader invocations in parallel. If these invocations all do atomicAdds, they can stomp each other right?

Internal visibility
After ensuring ordering, the other element that is needed for visibility is special GLSL syntax. The image or buffer variable being written to and read from must be qualified by the coherent qualifier. Note that both the writer and reader shaders must qualify their variables properly; otherwise, nothing is guaranteed.

My scenario is within a single rendering command so this part seems to apply as well.

Like I said, I don’t think atomics make much sense if they’re not implicitly coherent inside a single rendering command so I definitely want to agree with what you said! I’m just looking for confirmation.

I thought I might have the wrong definition of a shader invocation, but Shader, Execution and invocations defines invocations in the case of compute shaders as:

Compute Shaders: The number of invocations is defined by the number of work groups requested by the dispatch operation multiplied by the compute shader’s local size. Compute shader invocations within a work group have some limited intercommunication functionality.

So that would confirm that there are multiple invocations when workgroup size > 1 or num workgroups > 1

I’m really curious about the source for this statement:

It matches what I’ve seen in practice, but I work with a very limited set of OpenGL implementations.

A post was split to a new topic: SSBOs: Writing to the same locations from multiple workgroups