glDrawElements performance?

ihmin · April 14, 2022, 4:25pm

I am currently rendering using glDrawElements.
And I’m drawing so many triangles.
But the rendering speed is too slow, so I’m testing the index data after uploading it to the GPU.

The following uses the index data in the CPU’s memory.

glDrawElements(GL_TRIANGLES, m_mesh->GetTetraDrawCount(comp_id),
GL_UNSIGNED_INT, m_mesh->GetTetraDrawArrayPointer(comp_id));

The following uses the index data in GPU memory.

// Upload index data
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, m_index_buffer[comp_id]);
glBufferData(GL_ELEMENT_ARRAY_BUFFER,
m_mesh->GetTetraDrawCount(comp_id) * sizeof(unsigned int),
m_mesh->GetTetraDrawArrayPointer(comp_id), GL_STATIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);

....

// rendering
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, m_index_buffer[comp_id]);
glDrawElements(GL_TRIANGLES, m_mesh->GetTetraDrawCount(comp_id),
GL_UNSIGNED_INT, (void*)0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);

Why are the speeds similar?

mhagain · April 14, 2022, 5:18pm

You’re bottlenecked somewhere else. Maybe in your fragment shader, blending or even elsewhere on the CPU.

ihmin · April 15, 2022, 7:43am

Thank you for your answer.
Below is the fragment shader that I am using.
Is there a factor that might be slow?


#version 330
precision lowp float;
precision lowp int;

uniform isamplerBuffer selectBuffer;
uniform samplerBuffer elementValueBuffer;

uniform int useVertexNormal;

uniform vec4 color;

in VertexAttr {
	vec3 vertexNormal;
	vec3 faceNormal;
} attrIn;

out vec4 fragColor;

void main()
{
	vec4 outColor = color;

	vec3 normal = gl_FrontFacing ? attrIn.faceNormal : -attrIn.faceNormal;
	
	if (bool(useVertexNormal)) {
		normal = gl_FrontFacing ? attrIn.vertexNormal : -attrIn.vertexNormal;
	}
		
	outColor = vec4((color * 0.7).xyz, color[3]);

	float alpha = outColor[3];

    vec3 lightDir = normalize(vec3(0.27, 0.27, 0.92));

	// --------------- diffuse --------------------
	float df = clamp(dot(lightDir, normal), 0.0, 1.0);
	vec4 diffuse_color = outColor * 0.5;
	vec4 diffuse = df * diffuse_color;

	// --------------- specular --------------------
	float shininess = 5.0f;
	vec3 reflectDir = reflect(-lightDir, normal);  
	float sf = pow(max(dot(normal, reflectDir), 0.0), shininess);
	vec4 specular = sf * outColor;

	// --------------- ambinet --------------------
	vec4 ambinet = outColor * 0.7;

	// --------------- face color --------------------
	outColor.xyz = ambinet.xyz + diffuse.xyz + specular.xyz;
	vec4 finalColor = vec4(outColor.xyz, alpha);

	fragColor = finalColor;

}

Dark_Photon · April 15, 2022, 12:17pm

Questions:

How are you measuring your “rendering speed”?
What is your current rendering speed?
What is your target rendering speed?
How many triangles are you rendering?
Across how many draw calls?
With what pipeline state?
On what CPU, GPU, and GL driver?
If you reduce your target frame resolution to 128x128, what happens to your speed measurement?

arekkusu · April 15, 2022, 3:56pm

Don’t guess. Use tools to find your bottleneck.

Dark_Photon · April 15, 2022, 6:58pm

I’d second that.

But if you really have no idea where your bottleneck(s) are, then I’d start with:

Nsight Systems

over:

Nsight Graphics

…if you’re running on an NVIDIA GPU. That’ll tell you pretty quick … especially if you add NVTX markup to your frames.

(KHR_debug markup works too [wiki link], but tends to lag vs. NVTX due to the normal deferred command queue flushing associated with GL API command dispatch.)

ihmin · April 18, 2022, 6:23am

My program is not repeatedly rendered like a game, but rendering only when there is a mouse or keyboard response.
Currently, FPS have not been accurately measured, but they have certainly slowed down to the eye.
Mesh data with only shells, usually made of triangles, are displayed quickly.
However, the solid mesh, which most triangles are invisible on the screen, is being displayed slowly.

There are four solid meshes as shown below, and each mesh represents the number of tetra or hexa solid elements.

292193
512493
113076
381282

One tetra sends 12 vertexes with 4 triangles to each of the four sides.
One hexa sends 36 vertexes with 12 triangles on each of the six sides.

Is it normal to slow down to send this size?

Should I use a rendering method that does not render invisible elements on the screen?
Or will it be faster if I use the batch rendering method?

This is my machine.
Graphics card: Quadro P1000 (GPU Memory: 4GB)
CPU: 11th Gen Intel(R) Core™ i9-11900KF @ 3.50GHz 3.50 GHz
RAM: 64.0GB

ihmin · April 18, 2022, 6:24am

I should try using the tool you told me. Thank you.