Buffer orphaning on AMD

Silence · May 2, 2019, 12:46am

Hi all,

for some specific uses, I am doing buffer orphaning of the element buffer object. The main thing I am doing is this:

Orphan the EBO (glBufferData(ebo, size, nullptr))
For each object group
    For each object in group
        glBufferSubData(GL_ELEMENT_ARRAY_BUFFER, offset, index_size, data);
    glDrawElements(...)

This works wonderfully on nVidia graphic cards.
But unfortunately on AMD cards (Linux, free drivers), the screen is flashing (it is as if I have only some part of what I am sending each 2 or 3 frames).

Are any of you aware if buffer orphaning is something that we can do on AMD and Intel ? Or are you aware if on AMD and free drivers on Linux this is supported by the driver ?
According to this, this looks to be implementation dependant (but information might be outdated):

One issue with this method is that it is implementation dependent. Just because an implementation has the freedom to do something does not mean that it will.

I actually would prefer to avoid using mapping since this does not apply well with our current architecture, for some reasons: this would lead in copying our temp buffers into the mapped buffer, then this mapped buffer would have to be unmapped so data are sent to GPU. And since many threads are doing calculations, and that the number of indices differs frame to frame for each object.

Other questions: I haven’t tested that yet since this would imply to use more memory, but would doing a circular data writing/using worth to try ? What are the current state of the art for streaming nowadays ?

For information, this was tested on:
Linux, nVidia Geforce 1060 GTX
Linux, AMD Rx 580, free drivers (POLARIS10, DRM 3.27.0, 4.19.0-1-amd64, LLVM 7.0.1)

Thank you in advance. Regards.

Alfonse_Reinheart · May 2, 2019, 4:57am

That’s not really evidence of a problem with buffer updates. Also, it’d be a good idea if you transferred all of the data in a single transfer call, rather than a large number of sporadic calls if possible.

Do you have specific knowledge that your implementation implements mapped buffers through a copy that way? It seems your real problem is that you interleave the act of copying with the act of rendering. You really shouldn’t.

If the data is being generated on other threads, then it is those other threads who should be writing data to the mapped pointer.

That doesn’t sound like a good case for orphaning. When you orphan a buffer, that page is very clear that the size is supposed to stay the same:

the exact same size and usage hints it had before

So you shouldn’t be changing it.

I would say that it’s using persistent mapping with multiple buffers/regions of the same buffer.

Silence · May 2, 2019, 11:23am

Thank you for your answer.

Any suggestions on what it might be the cause of ?

I am aware of this and had that in mind but actually this isn’t possible. I’ll have a focus on this since from your answer there’s no obvious reason that it shouldn’t work.

OK, I understand that point. Actually this was simpler to do that way, but I’ll give it a try.

I had a similar main idea. But since each object may generate a different number of indices at each frame, then I don’t have the same size for each segment of the buffer. Delaying it lately at the rendering time is ugly (and as you pointed it might be the cause of issues), but it allows me to do reduce a lot the number of draw calls. Note that I currently have no mapped buffer.

Yes it is clear, this is why I use the exact same size (total number of indices), and this remains the same all along.

OK.