Problem with VBOs

Hello all,

I’m learning how to use VBOs but I have a problem. I have a model that is about 10k triangles and I can render it using vertex arrays or VBOs. With vertex arrays I get about 75fps but if I switch to VBOs I get about 0.003fps… so I must be doing something wrong.

My code looks like this

 

void InitializeVBOs()
{
    int 
        totalVerticesSize = 0,
        totalNormalsSize = 0,
        totalTxCoordsSize = 0,
        numTextures = 0,
        numVertices = 0,
        totalVBOSize = 0;

    if( 
        ( m_VBOsSupported ) && 
        ( m_pGeometry[ 0 ] != NULL ) &&
        ( m_pGeometry[ 0 ]->Get_NumVertices() > 0 )
        )
    {
        numVertices = m_pGeometry[ 0 ]->Get_NumVertices();
        numTextures = m_pGeometry[ 0 ]->GetNumTextures();

        totalVerticesSize = ( sizeof(float) * 3 ) * numVertices; // for vertices
        totalNormalsSize = ( sizeof(float) * 3 ) * numVertices; // for normals
        totalTxCoordsSize = 
            ( sizeof(float) * 3 ) * numVertices * numTextures; // for tx coords
        totalVBOSize = totalVerticesSize + totalNormalsSize + totalTxCoordsSize;

        glGenBuffersARB( 1, &m_GPUVertexBufferID );
        glGenBuffersARB( 1, &m_GPUIndexBufferID );

        // Bind the vertex data vbo.
        glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_GPUVertexBufferID );
        
        // Create a data store big enough for vertices, normals, and texture coords.
        glBufferDataARB( GL_ARRAY_BUFFER_ARB, totalVBOSize, NULL, GL_STATIC_DRAW_ARB );

        // Load vertices
        glBufferSubDataARB( 
            GL_ARRAY_BUFFER_ARB,// target
            0,                  // start from
            totalVerticesSize,  // total bytes of data
            m_pGeometry[0]->m_vertices // data
            );

        // Load normals
        glBufferSubDataARB(
            GL_ARRAY_BUFFER_ARB,// target
            totalVerticesSize,  // start from
            totalNormalsSize,   // total bytes of data
            m_pGeometry[0]->m_vertexNormals // data
            );

        // Load texture coordinates
        glBufferSubDataARB(
            GL_ARRAY_BUFFER_ARB,// target
            totalVerticesSize + totalNormalsSize, // start from
            totalTxCoordsSize,  // total bytes of data
            m_pGeometry[0]->m_txCoords // data
            );

        // Load the indices
        glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, m_GPUIndexBufferID );
            
        glBufferDataARB(
            GL_ELEMENT_ARRAY_BUFFER, // target
            m_numIndices * sizeof(unsigned short), // total size
            m_pGeometry[0]->m_indices, // data
            GL_STATIC_DRAW_ARB 
            );
    }
}

// ---

void Render()
{	
    int 
        totalVerticesSize = 0,
        totalNormalsSize = 0,
        numvertices = 0,
        numtextures = 0;
    
	//...
	//...	

	if( m_pGeometry[0] != NULL ) 
	{
            numvertices = m_pGeometry[0]->Get_NumVertices(); 
            numtextures = m_pGeometry[0]->GetNumTextures(); 
            totalVerticesSize = ( sizeof(float) * 3 ) * numvertices; 
            totalNormalsSize = ( sizeof(float) * 3 ) * numvertices; 
            
            // If VBOs are supported and we have valid buffers
            if( 
                m_VBOsSupported && 
                ( m_GPUVertexBufferID > 0 ) &&
                ( m_GPUIndexBufferID > 0 )
                )
            { 
                // Bind the data buffer
                glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_GPUVertexBufferID );

                // Bind the element buffer
                glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, m_GPUIndexBufferID );

                // Setup pointers

                glVertexPointer( 
                    3, // coords
                    GL_FLOAT, // floats
                    0, // stride is 0, vertices are contiguous
                    0  // offset is 0, vertices are at the start of the buffer
                    ); 
                    
                glNormalPointer( 
                    GL_FLOAT, // floats
                    0, // stride is 0, normals are contiguous
                    BUFFER_OFFSET( totalVerticesSize ) // offset is (total size of vertices)
                    );
  
                /*
                glTexCoordPointer(
                    2, // 2 coordinates
                    GL_FLOAT, // floats 
                    sizeof(float), // stride is one float
                    BUFFER_OFFSET( totalVerticesSize + totalNormalsSize ) // offset into the buffer
                );
                */
            } 
            else 
            { 
                // SET VERTEX AND NORMAL POINTER 		    		

                glVertexPointer(3, GL_FLOAT, 0, m_pGeometry[0]->m_vertices); 
                glNormalPointer(GL_FLOAT, 0, m_pGeometry[0]->m_vertexNormals); 
                /*glTexCoordPointer( 3, GL_FLOAT, sizeof(float), m_pGeometry[0]->m_txCoords ); */

            } 
                
	    groupIndices = m_pGeometry[0]->Get_NumIndices();

            if( m_VBOsSupported )
            {           
                start_time = timeGetTime() / 1000.0f;

                glDrawElements( 
                    GL_TRIANGLES, 
                    groupIndices,
                    GL_UNSIGNED_SHORT, 
                    0
                    );  

                end_time = timeGetTime() / 1000.0f;
                sprintf(tmpStr, "VBO CALL = %f", end_time - start_time);
                MessageBox(NULL, tmpStr, "VBO CALL", MB_OK );

                glBindBufferARB( GL_ARRAY_BUFFER_ARB, 0 );
                glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, 0 );
            }
            else 
            { 
                start_time = timeGetTime() / 1000.0f;

  	        glDrawElements(	
                    GL_TRIANGLES, 
		    groupIndices, 
	  	    GL_UNSIGNED_SHORT, 
		    m_pGeometry[0]->m_indices 
                    );

                end_time = timeGetTime() / 1000.0f;
                sprintf(tmpStr, "VA CALL = %f", end_time - start_time);
                MessageBox(NULL, tmpStr, "VA CALL", MB_OK );
            } 

	} 
}

 

With the message box calls I could see the problem seems to be with glDrawElements. For VBOs, the call takes about 1.3 seconds.

Any ideas what could be wrong ?? :frowning:

Also, I downloaded a demo from Delphi3D that renders a terrain using VBOs. The demo renders about 2 million triangles and I get about 9fps. The only difference is that the demo uses an interleaved array for vertices/colors.
Is there anything wrong with the way I setup the data in the VBOs?

Thanks for any help!

EDIT: Because the slowdown is so big it is possible that the driver is falling to sw fallback because of the limitations of the hardware (initially I assumed that this might be caused by bad interaction with the GPU caches however the numer triangles is low). Imho the most likely reason is the size of the offset (it is too big or there is too big difference between individual inputs) inside the VBO for normal and texture coordinate pointers. Try to use separate VBO for each part of the vertex or interleave them in single VBO.

Thanks for your reply Komat, I tried using a different VBO for normals and another for TexCoords but the result is still the same. If I interleave them in a single VBO how can I use multitexturing ? would I need a separate interleaved VBO for each texture ?

Originally posted by Count Duckula:
Thanks for your reply Komat, I tried using a different VBO for normals and another for TexCoords but the result is still the same.

That is strange. What happens if you use only the position array without normals or texture coordinates?.


If I interleave them in a single VBO how can I use multitexturing ? would I need a separate interleaved VBO for each texture ?

No. I did not meant to use the InterleavedArrays api which is old and limited, you can interleave manually. The VBO will contain array of structures similiar to the following:

struct VertexStruct {
float x, y, z ;
float nx, ny, nz ;
float u0, v0 ;
float u1, v1 ;
float u2, v2 ;
}

And then you will set the array pointers in such way that the stride is sizeof( VertexStruct ) and the offset corresponds to offset of corresponding fields within this structure (e.g. offset of u2 for the third texture coordinates array)

Originally posted by Komat:
That is strange. What happens if you use only the position array without normals or texture coordinates?.

I’ve tried it and it shows the model but with flat color since normals are missing but the speed is the same.

[b]No. I did not meant to use the InterleavedArrays api which is old and limited, you can interleave manually. The VBO will contain array of structures similiar to the following:

struct VertexStruct {
float x, y, z ;
float nx, ny, nz ;
float u0, v0 ;
float u1, v1 ;
float u2, v2 ;
}

And then you will set the array pointers in such way that the stride is sizeof( VertexStruct ) and the offset corresponds to offset of corresponding fields within this structure (e.g. offset of u2 for the third texture coordinates array) [/b]
Ahh, ok :slight_smile: , I hadn’t understood that part. I’ll try it as well.

Originally posted by Count Duckula:
I’ve tried it and it shows the model but with flat color since normals are missing but the speed is the same.

With the same speed you meant that even with only the position it is slow with VBO and fast without VBO?

Originally posted by Komat:
With the same speed you meant that even with only the position it is slow with VBO and fast without VBO?
Yep :frowning:

Which graphics card do you have?

It is also possible that something different, unrelated to VBO, is causing sw vertex processing. Because the VBOs are likely to be stored in video memory, this might cause additional performance hit for the sw emulation.

I have an ATI Mobility Radeon 7500 and I’ve just tried it on a GeForceFx 5500 and it’s very different. On the 5500 I get almost the same speed with VA (160fps windowed, 190fps fs) and VBOs (180fps windowed, 200fps fs). I had thought it would run faster on one of those cards but maybe it’s fillrate limited or it’s just the pc that’s not a high end one hehe…

However, on the 7500, it seems like the driver doesn’t like glDrawElements. I had a look at other VBO code on NeHe, basically what it does is duplicate the vertices so it doesn’t use indices. I did the same for testing, so I use glDrawArrays instead of glDrawElements and it’s better, it’s about 8fps, but the VA is still about 70fps.

One thing though… the NeHe code runs at about 60fps, and my duplicated vertices code runs at 8fps, so I might still be doing something stupid hehe.

can you post an app that people can test for themselves (including source)?

I believe Radeon 7500 supports upto OpenGL v1.3, and VBO in software mode. As far as I know, you need Radeon 9600 or higher video cards to run VBO in hardware mode.

Originally posted by Count Duckula:

However, on the 7500, it seems like the driver doesn’t like glDrawElements. I had a look at other VBO code on NeHe, basically what it does is duplicate the vertices so it doesn’t use indices. I did the same for testing, so I use glDrawArrays instead of glDrawElements and it’s better, it’s about 8fps, but the VA is still about 70fps.

The increase of speed when you change to the nonindexed draw might be explained by driver reading the video memory sequentially instead of using the random access based on indices.
The Radeon 7500 is old card with limited vertex processing capabilities so it is quite posible, that you use some vertex feature that is not hw accelerated. Are you using texture matrices, texgens, clip planes, polygon offsets, double sided lighting, separate specular or something similiar that is not used by that NeHe tutorial?

Originally posted by songho:
I believe Radeon 7500 supports upto OpenGL v1.3, and VBO in software mode. As far as I know, you need Radeon 9600 or higher video cards to run VBO in hardware mode.
The Radeon 8500 definitelly can run VBO in hw mode too. For the R7500 the limitation might only come from the driver itself because it can always store the VBO in system memory like ordinary vertex arrays if the hw does not support reading them from different place, however the NeHe VBO code runs fast.

Komat,
Yes, Radeon 8500 is older than 9xxx cards, but it is faster than some newer generation cards, for example, 8500 is faster than 9200. (Here is a naming confusion again.)

What I found was that VBO performance is very poor if glGetString(GL_VERSION) is 1.3 on Radeon cards. (I believe it result from hardware limitation.) Is your 8500 reports v1.3 or v1.5(or v2.0) on windows?

Thanks for all the replies :slight_smile:

can you post an app that people can test for themselves (including source)?
Got swamped at work today but I’ll try to upload it tomorrow =)


I believe Radeon 7500 supports upto OpenGL v1.3, and VBO in software mode. As far as I know, you need Radeon 9600 or higher video cards to run VBO in hardware mode.

Ahhh good point, I didn’t know that… I checked it (with glGetString ang GLee as well) and it’s 1.3 indeed.

The increase of speed when you change to the nonindexed draw might be explained by driver reading the video memory sequentially instead of using the random access based on indices.

Yep, that makes sense.


Are you using texture matrices, texgens, clip planes, polygon offsets, double sided lighting, separate specular or something similiar that is not used by that NeHe tutorial

Not really hehe, I was planning to incorporate that code in another app that uses more features.

I think I found the problem… I went from ~8 to ~200fps on the 7500 when commenting out these lines:

	//glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST);	
	//glHint(GL_LINE_SMOOTH_HINT, GL_NICEST);
	//glHint(GL_POLYGON_SMOOTH_HINT, GL_NICEST);

	//glLightfv(GL_LIGHT0, GL_POSITION, defLight_position);
	//glLightfv(GL_LIGHT0, GL_SPECULAR, defLight_specular);
	//glLightfv(GL_LIGHT0, GL_AMBIENT,  defLight_ambient);
	glLightfv(GL_LIGHT0, GL_DIFFUSE,  defLight_diffuse);

  

Although it would be good to know why…

Theoretically this should not be the reason why just because when you use ‘normal’ arrays, the rendering is faster. Light calculations slow things down, this turns out, but they really not directly linked with VBO.

What have you got if you use VBO without indicies ?

Also, give the full sources, I guess it will be easier.

Originally posted by Count Duckula:
Although it would be good to know why…
It is combination of two things:

  1. One or more from those lines (probably something from the glHints) forced the driver to do software vertex processing instead of using hw vertex processing unit because the hw does not support that particular feature.

  2. The VBO content was very likely stored in video memory or other uncached memory so the graphics card does have fast access to it. Reading from such memory by CPU is very slow, especially if special care is not taken.

Because the driver was emulating vertex processing on CPU and was reading from the memory optimized for GPU access, the result was slow. Without the VBOs, the driver was reading data from cached memory that is optimized for access by the CPU and the performance was significantly better. When you commented out those lines, the vertex processing reverted to the hw one which has no problems with reading from the memory in which the VBO was stored.

I really don’t understand why a glHint could participate to produce such a drop. Isn’t that glHint only a hint for the driver, not an obligation to exectute ? In this case, can this be considered as a driver bug, or should I accept that as a normal behavior ?