How to use glClearBufferSubData to clear part of an SSBO

ceisserer · March 19, 2021, 6:42pm

Hi,

I am using persistent+coherent mapped SSBOs to pass data to the GPU. The SSBO is devided into 3 areas and guarded by fences.

What I typically do, is:

wait for the fence of the particular region, to make sure the GPU isn’t using this region anymore
write to the area which is known to be currrently unused, only write values != 0
queue the drawing commands which consume the data just written to the SSBO
re-initialize the SSBO area with zero for the next write: glClearBufferSubData(GL_SHADER_STORAGE_BUFFER, GL_R32UI, regionNumber * MASK_BUFFER_REGION_SIZE, MASK_BUFFER_REGION_SIZE, GL_RED_INTEGER, GL_UNSIGNED_INT, NULL)
where MASK_BUFFER_REGION_SIZE is the size of a single region in bytes.
My assumption is, this glClearBufferSubData will only start after the drawing calles issued before have been performed.

Despite careful fencing, I see sporadic corruptions when using glClearBufferSubData - which are gone when writing the full data region.

What I wonder, am I using glClearBufferSubData in the correct way? There are a lot of parameters I am unsure about (GL_RED_INTEGER, GL_R32UI, GL_UNSIGNED_INT) - I just want to set a memory area of bytes to zero. Also, the spec talks about "basic machine units " with no further explanation - are those bytes?
Is my use of basic machine units valid after all?

Thanks, Clemens

GClements · March 20, 2021, 12:11am

“Basic machine units” are bytes.

glClearBufferSubData is like memset for buffers. Except it fills the region with values typically larger than a byte. The internalformat, format and type parameters are interpreted as for glTexImage* etc. A single value is read from client memory at the location specified by data, converted from the external format/type to the internal format, then repeated to fill the region from offset to offset+size. The offset and size parameters must be multiples of the size of the internal format (e.g. multiples of 4 for GL_R32UI).

ceisserer · March 20, 2021, 7:19am

Thanks yet again, GClements!

So from what you say I conclude my use of glClearBufferSubData is actually correct (area cleared is aligned) - so it must have to do with synchronization.

My sequence of operations are:
while(true) {

wait for the region using glClientWaitSync
write data to the region using direct CPU writes
perform draw calls, which read from the SSBO
glClearBufferSubData to clear the data used by the draw calls
place fence, so the next wait can be sure data has been read by the GPU (draw calls) and written to (glClearBufferSubData).
}

Do I have to place a server-side fence between the draw calls and glClearBufferSubData - I assumed this is implicitly taken care of.
Once I wait for a fence using glClientWaitSync (step 1) - can I be sure the buffer clear queued in step 4. has been finished and the SSBO can be written to by the CPU again?
The SSBO is mapped persistent + coherent, I dont synchronize on the buffer itself only via the command stream (glClientWaitSync). The CPU only writes, the GPU reads (consumes) and writes (zero fill).

When I replace the glClearBufferSubData with a memset inserted between 1. und 2., everything works as expected,

I see the same behavour with the proprietary nvidia driver, as well as the open-source AMD linux driver, so I guess this is a bug in my code and not some driver bug…

Thanks, Clemens

GClements · March 20, 2021, 5:46pm

My knowledge of the memory model is somewhat incomplete, but the main issue I see is that GL_MAP_COHERENT_BIT only means that data by one side (CPU or GPU) will become visible to the other side “eventually”. A client-side mapping of the buffer isn’t necessarily the actual memory seen by the GPU; there may be shadow copies and/or caches involved.

Hopefully someone with more concrete knowledge will chime in.