I’m implementing a rendering system that uses the mentioned extension which, if I understand correctly, is core since 4.3. AMD drivers as of Catalyst 14.4 claim to be GL4.4 complaint, so I’m wondering if the following behavior is a driver bug or something I don’t seem to grasp.
I’m afraid I cannot provide source code right now, so I’ll simply explain the problem. I have narrowed this down to the bare bones, so you’ll notice the actual process serves no purpose at all.
EDIT: Please also read the follow up if you are interested in this topic. It narrows the problem down.
- Shader buffers are bound to binding points 0 and 1. Buffer 0 contains the heads of a chained list, buffer 1 contains the pool of nodes. An atomic is used to access the pool and claim nodes. Buffers are smaller than the reported max allowed: pool is under 6 megabytes.
- The atomic is reset
- The data in buffer 0 (small buffer, heads) is set to the terminating token value
- The list of items is constructed in this pass.
- 3 Shader buffers are bound, including the previous two buffers, which remain at binding points 0 and 1
- The third buffer is bound to binding point 2 and is of very small size (under 1Kb). I have made all buffers have a fixed size, multiple of 256, which is the alignment requirement the driver reports (I understand this shouldn’t be needed unless I use BindBufferRange, but I did it anyway to discard obscure bugs).
- Three tested scenarios:
- Data is read from the third buffer only, via a simple constant (position 0 of the buffer): data read fails
- Data is read from the three buffers (regardless of using constants or the actual list data): data read from buffers 0 and 1 work, data read from 2 fails
What I mean with “read fails”: there’s two things that can happen. The driver either crashes, as if trying to read out of bounds on a buffer or the read data from the third buffer is corrupted.
In fact, I’ve been able to observe a situation that makes the third buffer readable (with the expected data) yet as the data from the bigger buffer (the 6Mb one) is populated further (a situation where, for example, you point the camera to an area that causes it to populate more items), the data from the 3rd buffer becomes corrupted. It’s very much like the binding points are being corrupted. I have moved the binding points around and I get different results which make little sense (sometimes it works at a binding point, but not at another, etc), but I do get different behavior.
Because of this, I’ve made several tests (hours of work here) and I’ve found a “work around” that has the stench of a driver bug. The complete application works (the actual app uses 7 different buffers) as expected if move the big buffer (the 6Mb one) to binding point 0 and I unbind and then rebind the buffers before loading a new Program object (with UseProgram). Otherwise the GPU either crashes or keeps reading corrupted. This is what “fixes” the problem:
//Remove anything bound to binding point 0 gl::BindBufferBase(gl::SHADER_STORAGE_BUFFER, 0, 0); //And bind the big buffer again to the same point gl::BindBufferBase(gl::SHADER_STORAGE_BUFFER, 0, nodesBO);
I reiterate, this has to be done before each draw call that uses the buffers.
I have 100% certainty that the data is being sent correctly and is correctly padded and with reasonable values. This (the complete app, actually) works both on Intel HD4000 or higher (yes, imagine that) and nVidia’s latest drivers. The integrity of the list data is also correct. Visualization of the data in false color (rendering shades as density) shows lists that are much smaller than the allocated space for the nodes.
I have read the extension’s document ( https://www.opengl.org/registry/specs/ARB/shader_storage_buffer_object.txt ), but I’m not finding anything related to this anomaly.
EDIT: I forgot to mention that I also suspected a synch problem, so I gave this command a try
Using it before writing to the list buffers and right after I’m done as well, which if I understand the documentation it should ensure that the data is live. It doesn’t make any difference
So, did I hit a driver bug?
I’m considering reimplementing the access to the smaller data buffers with textureBuffer samplers instead of shader buffers, since all the data on those is 4 float aligned anyways. Should I expect any significant performance difference between these two approaches?
I’m sorry for this big post and thank you in advance for reading.