OpenGL VBO Perfomance issue

I am currently working on rendering vector images .Earlier I was using DisplayList. It was giving me good performance. As part of OpenGL migration, we are using VBO now. I see that performance is really bad with VBO. Please let me know what is wrong with my Rendering function. I really don’t know where am I going wrong. Please let me know how can I go about improve the performance of VBO in the below code.

Below is my rendering function

struct DisplayIndexID {

    int idx;
    DrawStateT drawState;
    //Every display Index ID has its own draw models.
    std::vector<std::unique_ptr<vertexModel>> readytoDrawModels;
};

void drawDisplayLists(std::vector & v)
{

for (int i = 0; i < v.size(); i++)
{
             ...
	///***********PRINT AREA***********************/
	for (auto& vModel : v[i].readytoDrawModels)
	{

		if (vModel)
		{
				glBindVertexArray(geomVAO);
				glBindBuffer(GL_ARRAY_BUFFER, geomVBO);
				std::vector<QVector3D> vecToDraw = vModel->getVertices();
                		glBufferData(GL_ARRAY_BUFFER, sizeof(QVector3D) * vecToDraw.size(), /*&vecToDraw[0]*/ NULL, GL_DYNAMIC_DRAW);
				glEnableVertexAttribArray(0);
				glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(QVector3D), nullptr);
				glBindBuffer(GL_ARRAY_BUFFER, 0);
				glLineWidth(vModel->getLineWidth());
	}

              switch (vModel->getDrawMode())
			{
			case 0: //GL_POINTS
				glDrawArrays(GL_POINTS, 0, vModel->getVertices().size());
				break;

			case 1: //GL_LINES
					
				glDrawArrays(GL_LINES, 0, vModel->getVertices().size());
				break;

			case 2: //GL_LINE_LOOP
				glDrawArrays(GL_LINE_LOOP, 0, vModel->getVertices().size());
				break;

			case 3: //GL_LINE_STRIP
				glDrawArrays(GL_LINE_STRIP, 0, vModel->getVertices().size());
				break;

			case 4: //GL_TRIANGLES
				if (vModel->getTextureId() <= 0)
				{
					glDrawArrays(GL_TRIANGLES, 0, vModel->getVertices().size());
				}
                            } 
       }
 }

}

You’re copying the data into the buffer and setting up the VAO each frame. For static data, populating buffers and configuring the attribute arrays should be done once, during initialisation. Rendering an object should just be glBindVertexArray and glDraw* (plus setting any uniform state, e.g. line width, textures).

Thanks for the reply. My data is dynamic stored in a structure of vector. Each vector element has it own vertices and texture data. How can I achieve this by using a single glBufferData call ?

Can I do something like this initially to copy all the data using glBufferData() call to populate the buffer?

void initVbo(std::vector<DisplayIndexID> & v)
{
	
	for (int i = 0; i < v.size(); i++) 
		for (auto& vModel : v[i].readytoDrawModels)
		{
			if (vModel) {
				glBindBuffer(GL_ARRAY_BUFFER, geomVBO);
				std::vector<QVector3D> vecToDraw = vModel->getVertices();
				glBufferData(GL_ARRAY_BUFFER, sizeof(QVector3D) * vecToDraw.size(), &vecToDraw[0], GL_STATIC_DRAW);
			}
		}

}

Wouldn’t that just be overwriting the previous model’s data with the current one? Why doesn’t whatever vModel already have its data in a buffer? Or better still, have all models (of similar vertex formats) share the same buffer.

Thank you. I will try to make it as one buffer and upload it.

Thanks for the reply. I really appreciate. I have tried copying all VModel to one buffer but it is not working. I see distorted image. This what I did. I merged all the VModels data in to one buffer.

void initVbo(std::vector<DisplayIndexID> & v)
{
	std::vector<QVector3D> finalVecToDraw;
	finalVecToDraw.resize(1);
	for (int i = 0; i < v.size(); i++) 
		for (auto& vModel : v[i].readytoDrawModels)
		{
			if (vModel) {
				std::vector<QVector3D> vecToDraw = vModel->getVertices();
				finalVecToDraw.insert(finalVecToDraw.end() , vecToDraw.begin(), vecToDraw.end());
			}
		}
	    glBindVertexArray(geomVAO);
	    glBindBuffer(GL_ARRAY_BUFFER, geomVBO);
	    glBufferData(GL_ARRAY_BUFFER, sizeof(QVector3D) * finalVecToDraw.size(), &finalVecToDraw[0], GL_STATIC_DRAW);
}

Actually every VModel has its own drawMode (GL_POINT,GL_LINE,GL_TRIANGLE). So I was copying one VModel at time in loop to Buffer(glBufferData) and drawing it based on its drawmode. This works.

But My question is How can I decide which drawmode to which Vmodels data if I put all the VModels data in one Buffer?

Below Code works but performance is very poor.

for (int i = 0; i < v.size(); i++)
{
    	///***********PRINT AREA***********************/
	for (auto& vModel : v[i].readytoDrawModels)
	{
		if (vModel)
		{
				glBindVertexArray(geomVAO);
				glBindBuffer(GL_ARRAY_BUFFER, geomVBO);
				std::vector<QVector3D> vecToDraw = vModel->getVertices();
                		glBufferData(GL_ARRAY_BUFFER, sizeof(QVector3D) * vecToDraw.size(),  &vecToDraw[0], GL_DYNAMIC_DRAW);
				glEnableVertexAttribArray(0);
				glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(QVector3D), nullptr);
				glBindBuffer(GL_ARRAY_BUFFER, 0);
				glLineWidth(vModel->getLineWidth());
	}
                       switch (vModel->getDrawMode())
			{
			case 0: //GL_POINTS
				glDrawArrays(GL_POINTS, 0, vModel->getVertices().size());
				break;
			case 1: //GL_LINES
        			glDrawArrays(GL_LINES, 0, vModel->getVertices().size());
				break;
                                ...
                         } 
   }
}

I am kind of stuck.

If you don’t need to re-upload your vertex data every frame, pre-upload it before your draw loop.

If you do need to, see:

I have set glBufferData to NULL and uploading the data with glBufferSubData(). I some performance gain. But not much. Is there way to improve glBufferSubData() performance

is there a way can we cache data if the data remains the same across the frame? any OpenGL VBO ApI?

What do you mean “cache”? Buffer objects are not temporary. They’re allocations of GPU-accessible memory, and the stuff you stick into them stays there until you change it. That’s why we’re asking why your code is changing it all the time.

Thank you Alfonse. I got it. Actually my problem is, I am reading the CGM file using some library and storing all the information ( Vertices, textures and attributes ) in structure of vectors. So when it comes for Rendering , I have fixed the Buffer Size using glBufferData() and going through all the vector elements ( VModels) in a loop and uploading Vmodels data ( chunks ) using glBufferSubData on every iteration( Frames) . As you suggested, I copied all the Vmodel vertices to single buffer at once before it enters the draw loop. But this did not work for me. Do u think, is this the right approach to upload the data to GPU buffer in chunks? My goal is to match the speed of DisplayList ( glCallList ) which is deprecated. Sorry I am asking lot of questions since I am new to OpenGL

Ok, so static content that doesn’t change.

Why? If it doesn’t work right, I’d dig into that and fix it.

Why do you say “chunks”? Where’s that coming from?

Display lists have some potentially expensive compilation associated with them, but after that the actual rendering can be fairly efficient on some drivers.

If your goal is to match the performance of display lists with VBOs, then you definitely want to start with not re-uploading the data to the GPU when it does not change, and if possible pre-uploading it before you even get into your draw loop.

Also, when issuing draw calls with those buffer objects, you’ll want to either use NVIDIA bindless extensions (if you’re specifically targeting NVIDIA GPUs) or at least VAOs (if not). With NV bindless + static pre-uploaded bufrfer objects, I’ve matched display list performance on NVIDIA GPUs.

Thanks @Dark_Photon . It was very clear. Now I have pre populated the data and it works. The issue was with the Draw call. I was not passing the offset. Every time I was drawing from the beginning by passing 0 to a draw call(drawArrays). Performance seems better now. But not up to the mark.

As u suggested i will look into that NVIDIA bindless extensions (if you’re specifically targeting NVIDIA GPUs). What is it Basically? Is it supported in OpenGL ES 2.0? Can I add those API’s in my code? Can I get some sample code snippet ? Now I have data everything ready and pre-populated. Need some performance gain.

I assume that he’s referring to the NV_bindless_multi_draw_indirect extension. In short, it allows you to perform a multi-draw command, passing in handles to the vertex buffers as parameters, eliminating the need to explicitly configure VAOs and allowing multiple sets of vertex buffers to be used for a single command.

If you aren’t getting the desired performance, look into whether you can reduce the overhead of each drawing operation. Are you using a single set of vertex buffers for all objects, or are you using a separate set for each object? The former will eliminate the need to switch buffers between draw calls and may allow multiple objects to be drawn with a single call.

No. ES 2 doesn’t support VAOs either. But it probably wouldn’t help anyhow; on ES2-class hardware, the limiting factor is likely to be the lack of raw performance from the GPU. VAOs or bindless extensions are important when the GPU is so fast that the time taken just setting up the draw call is significant compared to the time taken to execute it. I.e. for AMD/Nvidia desktop GPUs.

Actually, I was referring the original NV bindless support from 11 years ago that that extension builds on. Namely:

See the first link for more detail, but (relevant to your use case) these two GL extensions allow you to provide GPU addresses into buffer objects, rather than mere buffer handles and offsets, to OpenGL when issuing draw calls, and to direct the driver to pre-stage buffer objects so that the GPU can read from them directly. This allows it to cut out quite a bit of the overhead that was previously slowing submission of lots of draw calls, particularly small ones. They provide other capabilities as well, such as being able to access GPU memory in shaders via pointers.

Since then, there have been other GL bindless extensions that have extended this support to other use cases within OpenGL, including the one that GClements mentioned.

Re using bindless (or VAOs) for vertex attribute and index lists… the larger you make your draw calls (without just wasting cycles), the less you’ll benefit from bindless (or VAOs) because draw call submission performance becomes a smaller percentage of your total time. However, with many small draw calls they can speed things up a bunch! But in any given program run, use one or the other, not both. I’d use NV bindless when you can, and fall back to VAOs if that’s not an option.