Regarding mapping multiple buffer objects

mobeen · November 20, 2011, 8:57pm

Hi all,
Back on the opengl wiki (http://www.opengl.org/wiki/Vertex_Buffer_Object) in the tips and tricks section, it says multiple buffers can be mapped simultaneously. I wanted to know how this might be possible?
Lets take my example buf0 and buf1 do i call them this way?


glBindBuffer(GL_ARRAY_BUFFER,buf0);
GLfloat* pBuf0 = (GLfloat*)glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE);
glBindBuffer(GL_ARRAY_BUFFER,buf1);
GLfloat* pBuf1 = (GLfloat*)glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE);
//stuff using pBuf0 and pBuf1
...
//at the end of modification
glUnmapBuffer(GL_ARRAY_BUFFER);

My question is are the above sequence of calls ok? and if I unmap which buffer will be unmapped (the last one right) then how do I unmap the first buffer?

Alfonse_Reinheart · November 20, 2011, 10:12pm

do i call them this way?

Remember, that “GL_ARRAY_BUFFER” part means “The buffer object that is was last bound with glBindBuffer(GL_ARRAY_BUFFER, ...)” If you want to map a buffer, you bind it to a buffer binding point (it doesn’t have to be GL_ARRAY_BUFFER), then map it. You are then free to bind a different buffer and whatever you want.

If you want to unmap a buffer, you must first bind it, because glUnmapBuffer only works on buffers that are bound.

Therefore, what your code does depends on what happens during the … part. If buf1 is still bound, then it will be unmapped. If something else was bound in the meantime, then you’ll probably get an error.

mobeen · November 20, 2011, 11:12pm

Thanks for the response Alfonse. So this means at any point I can only work with one buffer object bound to a ARRAY BUFFER binding? i.e. the last bound ARRAY BUFFER will be read/written to? If this is the case, is there a way (an extension etc. using which) I could read/write to two ARRAY BUFFER simultaneously?

My application currently copies the result on the CPU (from the first vbo, map buffer, then memcpies it, then attaches the second vbo, maps it then memcpies it and so on. I want to remove this copy altogether to speed up the process.

Another question is can I intermix between the BUFFER types lets say I start with two ARRAY BUFFERs. For copying I bind the first to ARRAY BUFFER and second to GL_PIXEL_UNPACK_BUFFER binding? This will allow me to use both simultaneously. Is there a tutorial covering such a thing?

Alfonse_Reinheart · November 21, 2011, 12:07am

So this means at any point I can only work with one buffer object bound to a ARRAY BUFFER binding?

No, that is the opposite of what I said. Take the following sequence of commands:


glBindBuffer(GL_ARRAY_BUFFER, buffer);
glBufferData(GL_ARRAY_BUFFER, byteSize, pData, GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);

This binds a buffer, allocates it with a particular piece of data and then unbinds it. So… what is stored in the buffer after it is unbound?

pData. That is, exactly what was stored in it by those commands. Unbinding a buffer does not change its contents. It does not affect its state. The buffer does not become empty because it is unbound. It does not become smaller or anything.

The only reason you bound it at all is because glBufferData only works on a buffer that was bound. You bound it in order to give it data.

Now, what do you suppose happens if we substitute glMapBuffer for glBufferData?


glBindBuffer(GL_ARRAY_BUFFER, buffer);
glMapBuffer(GL_ARRAY_BUFFER, ...);
glBindBuffer(GL_ARRAY_BUFFER, 0);

What happens after the buffer is unbound?

Nothing. It’s still mapped, because being mapped is part of the buffer object’s state. Just like it’s contents. Just like it’s size. And so forth. That doesn’t go away just because you unbound it.

Just as with glBufferData, the only reason you bound it is because glMapBuffer only works on a buffer that is bound.

Mapping is not a global construct. It is per-object state. Unmapping is likewise per object. Therefore, if you have a mapped buffer object and you want to unmap it, you must ensure that it is bound so that glUnmapBuffer knows where to find it.

mobeen · November 21, 2011, 11:23pm

OK thanks for the clarifications. Would I get any benefit from using the bindless graphics which as I read elsewhere gives me the GPU address and I can write/read directly from it?

Dark_Photon · November 22, 2011, 4:51am

The GPU address is only good for the GPU pipeline to access your buffer with. So if you have the GPU do something with your buffer, then yes. But it’s not going to help you on the CPU side (AFAICR).

If you’re just trying to avoid binding a buffer to a bind point to map it, don’t forget DSA!:


  void *p = glMapNamedBufferRangeEXT( handle, offset, size, flags );

  memcpy( p, ... );
  glUnmapNamedBufferEXT( handle );

I use that even when I’m filling a buffer object that I render on the GPU with bindless.

But if I could back up a second and maybe address the larger question…

My application currently copies the result on the CPU (from the first vbo, map buffer, then memcpies it, then attaches the second vbo, maps it then memcpies it and so on. I want to remove this copy altogether to speed up the process.

Are you doing a readback and copy on the CPU? Can you do it on the GPU instead?

mobeen · November 22, 2011, 11:13pm

Hi Dark Photon,
Well Ideally i would like to do all the work on GPU but I cant. Let me tell u what I am doing.

I am trying to do meshless FEM on GPU. I have a CPU implementation already. Now I am trying to port it to GPU. There are two parts of this algorithm,

Force calc. and
Integration.

I have already developed an integration pipeline using transform feedback (TF) and it runs super fast. The only problem is that I can’t put the force calculation in the integration shader since it needs to scatter the resultant force in the neighborhood. I refomulated the scatter to gather however that requires storage of a vector of N 3x3 matrices (N is the no. of neighbors).

My TF vertex shader outputs the current position, velocity and acceleration. I have tried to do force calc. on the same vertex shader but it is painfully slow due to the large number of texture fetches for neighbor variables I need to use for force calculation. So I thought i would do the force calculation on the CPU and do the integration on the GPU using TF. Now to make that work, in each iteration, I need to copy the current position, velocity and acceleration from the CPU vector to the VBO. Then do integration on the GPU using TF and finally copy the new vel., pos and acc. to the CPU vector from the VBO so that the new force can be calculated on the new pos/vel/acc. This unfortunately requires 6 memcopies per integration iteration. I was thinking of finding a way whereby I could use the vbo memory directly for calculating forces but I dont know if I could do that? What do u suggest I should do?

Dark_Photon · November 23, 2011, 6:01am

I see what you’re saying. Not super-simple to just toss on the CPU. Given the shared locality, sounds like maybe a job for OpenCL/CUDA, or possibly ARB_shader_image_load_store. (BTW, I think I might have helped review your chapter. If not, let me just plug the OpenGL Insights book – worth buying a copy when it comes out.)

mobeen · November 23, 2011, 7:12am

Hmm yeah CUDA/OpenCL was the last option I had in mind. I want to make it work ideally in shaders since the rest of the downstream stuff is already done in it.

I dont know much about ARB_shader_image_load_store I will have a look into it. By the way do u know of a nice sample/demo using this extension which might help?

Wow it was you Dark Photon. I must say I was utterly impressed by your intriguing questions and insights. Thanks for the review and general feedback on it. By the way are you contributing too in this awesome book?

Dark_Photon · November 23, 2011, 4:57pm

Right. I agree, definitely preferable if you can make the perf work out there. OpenCL/CUDA definitely has a learning curve; not just to do “something”, but to do it safely and efficiently. Very different world than GLSL.

I dont know much about ARB_shader_image_load_store I will have a look into it.

Same here. I really want to sink my teeth into it over vacation when I’ll have time. Having done a little highly optimized OpenCL/CUDA work, seeing some of the concepts like thread cooperation, sync barriers, and side-effects appear in GLSL definitely got my attention. Not to mention some of the demos…

By the way do u know of a nice sample/demo using this extension which might help?

There are a few I know about (probably more that I don’t):

Fast and Accurate Single-Pass A-Buffer using OpenGL 4.0+ (Crassin, 6/10)
OpenGL 4.0+ ABuffer V2.0: Linked lists of fragment pages (Crassin, 7/10)
Oit And Indirect Illumination Using Dx11 Linked Lists (GDC’10 PPSX link)

Thanks for the review and general feedback on it. By the way are you contributing too in this awesome book?

Sure thing. Hope some of it was helpful. No, no content in there from me. I work for a big company that owns what I do while I work for them and has little interest in seeing their employees published, so figured this was the best way to help out.

mobeen · November 23, 2011, 11:06pm

OK ARB_shader_load_store seems to be for OpenGL 4 and above. My hardware is NVIDIA Quadro FX 5800 (OpenGL 3.3) so I think i would have to go with NV_shader_buffer_load for the moment.