OpenGL VBO Indexing ( How to compute Index Array)

We are migrating OpenGL to newer version ( ES 2.0 ). Application actually renders Vector images ( i.e CGM files). I have successfully rendered the graphics using vertices using VBO. But the problem is performance. DisplayList performance is way better than VBO. So I am thinking using VBO indexing. How to come up the indices array ? Will Indexing improve performance? Please find my code below

/This is my data structure

struct DisplayIndexID {
        int idx;
        DrawStateT drawState;

        //Every display Index ID has its own draw models.
        std::vector<std::unique_ptr<vertexModel>> readytoDrawModels;
    };

//Initializing the VBO 
void initVbo(std::vector<DisplayIndexID> & v)

{
    glBindVertexArray(geomVAO);
    glBindBuffer(GL_ARRAY_BUFFER, geomVBO);
    std::vector<QVector3D> vecToDraw;
    std::vector<QVector3D> finalVecToDraw;
    for (int j = 0; j < v.size(); j++)
        for (auto& vModel : v[j].readytoDrawModels)
        {
            if (vModel) {
                vecToDraw = vModel->getVertices();
                finalVecToDraw.insert(finalVecToDraw.end(), vecToDraw.begin(), vecToDraw.end());

            }
        }

    glBufferData(GL_ARRAY_BUFFER, sizeof(QVector3D) * finalVecToDraw.size(), &finalVecToDraw[0],GL_STATIC_DRAW );
glBindBuffer(GL_ARRAY_BUFFER, 0);
}

//Rendering function 
void drawDisplayLists(std::vector<DisplayIndexID> & v)
{
    GLintptr offset = 0;

    initVbo(v);

    for (int i = 0; i < v.size(); i++)
    {

        //***********PRINT AREA***********************/
        for (auto& vModel : v[i].readytoDrawModels)
        {
           glBindBuffer(GL_ARRAY_BUFFER, geomVBO);
           glEnableVertexAttribArray(0);
           glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(QVector3D), (GLvoid*)offset); 
           switch (vModel->getDrawMode())
            {
                case 0: //GL_POINTS
                    glDrawArrays(GL_POINTS, 0, vModel->getVertices().size());
                    break;

                case 1: //GL_LINES
                    glDrawArrays(GL_LINES, 0, vModel->getVertices().size());
                    break;
                case 3: // GL_TRIANGLE
                     ...
        }
                  offset += sizeof(QVector3D) * vModel->getVertices().size();
    }  



}

You should try to find out why this is. I’d assume that the VBO case is doing more work than it strictly needs to. E.g. in the above code, you’re calling glVertexAttribPointer for each model, with the same buffer but different offsets. You could presumably just change the second parameter of glDrawArrays (first) to get the same result.

Only if you’re getting a performance hit due to duplicated vertices. Draw calls cache the vertex shader outputs, so repeated occurrences of a particular index will re-use the cached data rather than executing the vertex shader again (assuming that the data is still in the cache). This is more significant when using separate lines and triangles than for strips/loops/fans (which have implicit sharing).

Which GPU and OS?
And is the content of your VBOs static or dynamic?

The answers have a big bearing on why you’re seeing slow performance and what solutions are available to you.

GPU is Nvidia Quadro M2200 and OS Windows 10 64 bit.
VBO content is static.

Ok, so a low-end laptop Quadro from 4-6 years ago.

I too have experienced how good NVIDIA display list performance is in their GL drivers, and I’ve definitely seen the slowdown you’re experiencing with lots of tiny batches (draw calls) using vertex/index data in “bound” buffer objects (albeit on NVIDIA desktop GPUs, not NVIDIA laptop GPUs, so you’re probably seeing it worse than I was).

Based on my experience, your best bets to improve perf in this situation toward display list perf are:

  1. NVIDIA Bindless Buffers extensions
  2. VAOs
  3. Increase your batch sizes (reduce the number of draw calls and buffer binds)
  4. Client arrays (if the amount of batch data is small)

With #1, I’ve matched display list performance without rebatching (assuming display list = batch), but of course this is NV-only. If #1 isn’t an option, consider #2. It will improve perf over using the standard VAO-less bind buffers + draw paradigm, but not as much as NV bindless + draw. For best perf don’t use both #1 and #2 at the same time. Pick one.

#3 you can do alone, or in combination with #1 or #2 and it will help. Larger batch sizes yields fewer draw calls and buffer binds. So the per-buffer bind and per-draw call overhead becomes a smaller percentage of the total time.

As a last ditch fallback, #4 (client arrays) are very efficient when the data is dynamic or the amount of batch data is small. The driver uses a very fast streaming method hand-tuned by the driver devs, so it’s tough to beat for dynamic data. Of course, for static data you pay an extra cost of re-uploading the content each time, which you can optimize away with smart use of buffer objects (#1 or #2, and possibly #3 as well).

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.