Buffer orphaning question (Streaming VBOs)

Is there any reason why you shouldn’t be able to:
[ol][li] upload a vertex attribs block to buffer[] set vtx attribs[] orphan buffer[] upload vertex indices to same buffer[] draw[/ol][/li]NVidia driver really hates it when I do that, rendering garbage for the batch. However, if I ensure that attribs and indices never strattle an orphan it seems to work.

I thought an orphan basically just detaches the block, queues any unsent portions for GPU upload, and gives you a fresh block. And I latched the unsent portions of the first block to vtx attribs before orphaning it. Further, in the general case, attribs and indices are in separate buffer blocks anyway. So I don’t fully understand why this should be invalid.

Any thoughts?

If that made no sense and you need more detail, I’m trying out Rob Barris’ “Streaming VBO” technique described here in this thread (VBOs strangely slow?).

So, in code:

Init:


const unsigned STREAM_VBO_SIZE = 1*(1<<20)      // i.e. 0x100000

glGenBuffers( 1, &vbo )
glBindBuffer( GL_ARRAY_BUFFER        , vbo )
glBufferData( GL_ARRAY_BUFFER        , STREAM_VBO_SIZE, 0, GL_DYNAMIC_DRAW )
glBindBuffer( GL_ELEMENT_ARRAY_BUFFER, vbo )

Draw Batch:


glBindBuffer    ( GL_ARRAY_BUFFER        , vbo )
glMapBufferRange( GL_ARRAY_BUFFER, 0xFFF80, 128, 
                  ( GL_MAP_WRITE_BIT |
                    GL_MAP_UNSYNCHRONIZED_BIT | 
                    GL_MAP_INVALIDATE_RANGE_BIT ) )
<< fill buffer >>
glUnmapBuffer   ( GL_ARRAY_BUFFER )
glVertexAttribPointer( 0, 3, GL_FLOAT, FALSE, 23, 0xFFF80 )
glVertexAttribPointer( 2, 3, GL_BYTE , TRUE , 23, 0xFFF8C )
glVertexAttribPointer( 8, 2, GL_FLOAT, FALSE, 23, 0xFFF8F )

glBindBuffer    ( GL_ELEMENT_ARRAY_BUFFER, vbo )
glMapBufferRange( GL_ELEMENT_ARRAY_BUFFER, 0, 64, 
                  ( GL_MAP_WRITE_BIT |
                    GL_MAP_INVALIDATE_BUFFER_BIT ) )
<< fill buffer >>
glUnmapBuffer   ( GL_ELEMENT_ARRAY_BUFFER )
glDrawRangeElements( GL_TRIANGLES, 0, 4, 6, GL_UNSIGNED_SHORT, 0x0 )

(Redundant glBindBuffer calls added to the Draw Batch code just for illustration clarity.)

As you can see, to force the scenario in question, I’ve carefully uploaded the vertex attrib block to the end of the VBO and latched the attribs from there. So now the VBO is “full” and we have to orphan. Then we upload the indices to the beginning of the fresh VBO and latch them with a draw call.

As I said, this results in a bonkers draw on NVidia (doesn’t crash, but yields garbage vertex positions/attribs). But if I don’t separate the vtx attribs and indices by an orphan, it works.

Anybody see something I’m missing here? My bug? NVidias? Or are we off in “unspecified behavior” land?

Also, if anyone sees why this should perform poorly (the above, but carefully avoiding splitting vertex attribs and indices for a batch with an orphan), please let me know. Haven’t dug into it yet, but with a fixed eyepoint, this benches pretty close to client arrays (which is pretty good), but when I start moving the eyepoint it seems to slog, whereas client arrays doesn’t.

Only guess right now is that maybe my “fill buffer” operations are invalidating cache lines, and whatever NVidia uses for client arrays internals doesn’t (?) But that’s a stretch.

Would appreciate tips from someone that’s done this.

The second glMapBufferRange() contains GL_MAP_INVALIDATE_BUFFER_BIT. This tells the driver “you may invalidate the whole buffer”. This destroys the vertices you’ve just uploaded, resulting in garbage rendering.

Surprisingly, your first glMapBufferRange() contains only GL_MAP_INVALIDATE_RANGE_BIT, which doesn’t orphan the buffer.

So, may guess is, you’ve just mixed up the both :slight_smile:

And I latched the unsent portions of the first block to vtx attribs before orphaning it.

Calling glVertexAttribPointer() on a buffer does not mark it as used, nor does that prevent you from destroying it! Only calls that actually use the buffer (drawing to/from VBO, using a PBO for reading pixels or uploading textures) ‘pins’ the VBO at least as long as the operation lasts.

My tip would be to not use two mapping operations, but instead upload both type of data in one go, right behind each other. Yes, you can upload index data into a vbo that is bound and mapped as GL_ARRAY_BUFFER. The only thing you should be careful with is to align the start of vertex and index data to offsets that are at least multiple of 4.

Thanks for the reply. There’s a catch to that though. Otherwise, the technique wouldn’t work at all.

An orphaned buffer can’t be destroyed immediately. It has to be queued for destruction/recycling (along with draw commands at least). That way previously issued draw commands work fine if they reference this orphaned data, so long as they were queued prior to queuing the orphan event.

My faulty thinking was that vertex attrib sets fell in this category too (queued alongside buffer events when called). At the core of my question is in what buffer context is vertex attrib state “latched” (e.g. queued alongside buffer events)? The buffer context active at the “set” calls, or the “draw” call? Behavior (and in hind sight logic) says the latter.

Surprisingly, your first glMapBufferRange() contains only GL_MAP_INVALIDATE_RANGE_BIT, which doesn’t orphan the buffer.

Like I said, I was carefully splitting the batch across across an orphan, so the vertex attribs were copied in via the Map equivalent of subload (glMapBufferRange with INVALIDATE_RANGE / UNSYNCHRONIZED). After that we orphan before uploading the indices to produce the problem.

Calling glVertexAttribPointer() on a buffer does not mark it as used, nor does that prevent you from destroying it! Only calls that actually use the buffer (drawing to/from VBO…

Ok, thanks. Is there some language on this “mark as used” that I missed in the specs? Guess I was assuming it was all an issue of what order various events are queued in the command queue (and “when” – immediately or later with draw call).

Given my previous interpretation of how buffer orphan and upload events are queued, I assumed that the attrib pointer set calls were queued immediately when called alongside buffer events in the queue. But if instead the attrib pointer set state is merely saved in the C API layer and not latched and queued for GPU send until the draw call (which now that I think about it makes more sense; driver doesn’t know the enables yet, and vtx attrib set calls don’t even need called each batch if the params don’t change, so there may not even be a “set”), then your interpretation and the observed behavior definitely makes more sense.

So I guess in summary: vertex attrib buffer state is actually queued and used in the buffer context active at the draw call, NOT the buffer context active at the attrib set calls. So it’s impossible to split a batch by a buffer orphan event.

My tip would be to not use two mapping operations, but instead upload both type of data in one go, right behind each other.

Thanks. Yeah, I was thinking about that. This was just a first-cut prototype to play with this technique. Haven’t done any refactoring for max perf yet.

Is there some language on this “mark as used” that I missed in the specs?

Not that I knew of. Sometimes there is language that describes when the real destruction of objects is artificially deferred (like when its bound in a second context at the time you destroy it).

For VBOs I would say that only operations that are actually dependent on the contents of the buffer when they are called, will mark them as “in use”.

Ok, thanks.