Very poor rendering performance

I start with Vulkan and I notice that the drawing method I use is proportionally slow to the number of objects to render. For example, it may take more than a second to refresh the view. Exactly the same test with OpenGL is instantaneous.
I use LWJGL (therefore Java) and GLFW on MacOs (Radeon Pro 580 8 Go).

It seems obvious that I am doing it wrong but I do not see how to do otherwise better.
The test consists in displaying 1728 cube (12 ^ 3) and playing on the parameters of the camera to refresh the view.

For rendering, between the vkCmdBeginRenderPass and vkCmdEndRenderPass functions I use a loop that goes through each cube. The bindings on the DescriptorSet, VertexBuffer and IndexBuffer are invoked then we call vkCmdDrawIndexed (). We start again for the next cube and so on.

There is only one dynamic DescriptorSet which contains the model view projection matrices of each cube (offset addressing) so that the transformation is done in the vertex shader.

The vertices and colors of all cubes are in a single VertexBuffer (addressing by offset) and the indexes are also in a single IndexBuffer (addressing by offset). The indexes point from the position of the cube offset in the VertexBuffer and not from the start. That is to say that for a cube (8 vertices) the indexes of the first will go from 0 to 7, the same for the second cube and so on.

The loop agorythm would be:

for each cube {
   offsetUbo = cube.getOffsetUbo ()
   offsetVertex = cube.getOffsetVertex ()
   offsetIndex = cube.getOffsetIndex ()
   cubeIndexSize = cube.getIndexSize ()

   vkCmdBindDescriptorSets (renderCommandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayout, descriptorSets, offsetUbo)
   vkCmdBindPipeline (renderCommandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline)
   vkCmdBindVertexBuffers (renderCommandBuffer, 0, vertexBuffer, offsetVertex)
   vkCmdBindIndexBuffer (renderCommandBuffer, indexBuffer, offsetIndex, VK_INDEX_TYPE_UINT16)
   vkCmdDrawIndexed (renderCommandBuffer, cubeIndexSize, 1, 0, 0, 0)

It turns out that invoking these functions in a loop has a very high cost. with by order of magnitude:
vkCmdBindDescriptorSets, vkCmdDrawIndexed, vkCmdBindPipeline, vkCmdBindVertexBuffers and vkCmdBindIndexBuffer.

In front of this observation, it becomes obvious that it is necessary to do without the loop and certain functions seem studied for that but I do not see how to do it.
I thought of filling the indexBuffer with indices which point since the beginning of the vertexBuffer and by transmitting to vkCmdDrawIndexed the sum of the indices of all the cubes. It works for the same unique vertex shader because I don’t see how to link to the descriptorSet.

If anyone has an idea, it is welcome (and they will have won an image :slightly_smiling_face:).

Hi @Graou74, you needn’t call

vkCmdBindPipeline (renderCommandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline)

inside rendering loop, try to call only once before rendering loop.

could you show OpenGL Version ?

Hi AndreyOGL_D3D,

Yes you are right. I even managed to optimize it a bit by exiting vkCmdBindVertexBuffers from the loop and by redefining the indexes of the indexBuffer by pointing from the start of the vertexBuffer.

which now gives:

vkCmdBindPipeline (renderCommandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline)
vkCmdBindVertexBuffers (renderCommandBuffer, 0, vertexBuffer, 0)
for each cube {
    offsetUbo = cube.getOffsetUbo ()
    offsetIndex = cube.getOffsetIndex ()
    cubeIndexSize = cube.getIndexSize ()

    vkCmdBindDescriptorSets (renderCommandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayout, descriptorSets, offsetUbo)
    vkCmdBindIndexBuffer (renderCommandBuffer, indexBuffer, offsetIndex, VK_INDEX_TYPE_UINT16)
    vkCmdDrawIndexed (renderCommandBuffer, cubeIndexSize, 1, 0, 0, 0)

It is a gain in speed of less than 20% which is far from satisfactory.

The OpenGL version is 3.3.

Well, I found an alternative that requires doing without the dynamic descriptorSet by applying the model matrix of each cube when building the vertexBuffer.
So the vertex shader would only have to apply the view projection matrix of the camera and the rendering loop will have completely disappeared.

This time Vulkan gives better results than OpenGL but it is no longer the vertex shader which applies the transformations as I do with OpenGL.

Consequently, if someone knows a method for putting the dynamic descriptor back into play (delegate the transformations to the vertex shader) while keeping honorable performances, I remain very interested.

You shouldn’t. You should never be rendering cubes individually with different state. You should be rendering the entire field of cubes with the exact same state.

Unless you have millions of cubes that are independently moving around, there is no reason to have state changes between them. And even in such a case, it would be better to use a push constant provided before the draw call to give them the transform data (or an index in an SSBO for them to use to select a transform).

:smiley: Could you show Code with OpenGL, not OpenGL version.

Thank you, Mr. Reinheart,

This information will help me on certain directions to focus on when learning Vulkan.

Yes of course if you are interested in this code.

It’s Java but it should be understandable compared to C.
The first render() method reviews all the objects in the 3D scene and invokes for each object the RenderObject.render() method.

private void render(final Camera camera, final int width, final int height) {
	final Matrix4f mvp = camera.getModelViewProjection(width, height);
	final Matrix4f mvpIdentity = camera.getModelViewProjectionIdentity(width, height);
	Matrix4f mvpClone = new Matrix4f(mvp);
	IObjet3D obj3D;
	Point position;
	IOrientation orientation;
	RenderObject renderObject;
	for (IGlObjet glObject : camera.getScene().getGlObjetsLst()) {
		renderObject = Scene3DOpenGl.getRenderObject(glObject);
		obj3D = glObject.getObjet3D();
		if (obj3D == null) {
			renderObject.render(mvp, mvpIdentity, modelViewProjectionShaderId);
		mvpClone = new Matrix4f(mvp);
		position = obj3D.getPositionGlobal();
		orientation = obj3D.getOrientationGlobal();

		mvpClone.translate((float) position.getX(), (float) position.getY(), (float) position.getZ());
		mvpClone.rotateAffine((float) orientation.getLongitudeRad(), 0f, 1f, 0f);
		mvpClone.rotateAffine((float) orientation.getLatitudeRad(), 1f, 0f, 0f);
		mvpClone.rotateAffine((float) orientation.getRoulisRad(), 0f, 0f, 1f);

		renderObject.render(mvpClone, mvpIdentity, modelViewProjectionShaderId);

The RenderObject class :

private IGlObjet geo;
private int vaoId;
private int vboId;
private int vboiId;

 * Buffers initialisations.
public void init() {
	final FloatBuffer fbuffer;
	final IGlGeometrie glGeo = geo.getGlGeometrie();
	vaoId = GL33.glGenVertexArrays();
		vboId = GL15.glGenBuffers();
		glBindBuffer(GL_ARRAY_BUFFER, vboId);
			fbuffer = BufferUtils.createFloatBuffer(glGeo.getVerticesCount() * 7 * OsUtilities.FLOAT_SIZE);
			glBufferData(GL_ARRAY_BUFFER, fbuffer, GL_STATIC_DRAW);
			glVertexAttribPointer(0, 3, GL_FLOAT, false, 7 * OsUtilities.FLOAT_SIZE, 0);
			glVertexAttribPointer(1, 4, GL_FLOAT, false, 7 * OsUtilities.FLOAT_SIZE, 3 * OsUtilities.FLOAT_SIZE);
		glBindBuffer(GL_ARRAY_BUFFER, 0);
		vboiId = GL15.glGenBuffers();
		glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, vboiId);
			glrBufferDataIndices(glGeo, GL_STATIC_DRAW);
	int e = GL20.glGetError();
	if (e != GL20.GL_NO_ERROR) {
		log.fatal("ERREUR OPENGL DE n° : " + e);

private static void glrBufferDataIndices(final IGlVertexBufferable geom, final int usage) {
	final int target = GL_ELEMENT_ARRAY_BUFFER;
	final EnumBufferElementType type = geom.getTypeIndices();
	final ByteBuffer buffer;
	final ShortBuffer shortBuffer;
	final IntBuffer intBuffer;
	switch (type) {
	case BYTE			: 
	case SHORT			: 
		buffer = BufferUtils.createByteBuffer(geom.getIndicesCount() * OsUtilities.SHORT_SIZE);
		shortBuffer = buffer.asShortBuffer();
		//The 3D object know how to fill the buffer.
        geom.putIndices(shortBuffer, null);
		glBufferData(target, shortBuffer, usage);		
	case INT			: 
		buffer = BufferUtils.createByteBuffer(geom.getIndicesCount() * OsUtilities.INT_SIZE);
		intBuffer = buffer.asIntBuffer();
		geom.putIndices(null, intBuffer);
		glBufferData(target, intBuffer, usage);		
	case LONG			: 
	case FLOAT	 		: 
	case DOUBLE 		: 
	default     		: throw new RuntimeException("Invalid type : " + type);

public void render(final Matrix4f mvp, final Matrix4f mvpIdentity, final int modelViewProjectionShaderId) {
	final FloatBuffer matrixBuffer = BufferUtils.createFloatBuffer(16);
	final Matrix4f mvpClone;
	if (geo.getGlGeometrie().isCameraIdentity()) {
		glUniformMatrix4fv(modelViewProjectionShaderId, false, mvpIdentity.get(matrixBuffer));
	else {
		mvpClone = new Matrix4f(mvp);
		glUniformMatrix4fv(modelViewProjectionShaderId, false, mvpClone.get(matrixBuffer));

		glrDrawElements(geo.getGlGeometrie(), vboiId);
		int e = GL20.glGetError();
		if (e != GL20.GL_NO_ERROR) {
			log.fatal("ERROR OPENGL n° : " + e);

private static void glrDrawElements(final IGlGeometrie geom, final int vboiId) {
	final int mode;
	final int type = getTypeOpenGL(geom.getTypeIndices());
	switch (geom.getGlDrawMode()) {
	case GL_LINE		:	mode = GL_LINES;		break;
	case GL_TRIANGLES	:	mode = GL_TRIANGLES;	break;
	default     		:	throw new RuntimeException("Not implemented : " + geom.getGlDrawMode());
	glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, vboiId);
	glDrawElements(mode, geom.getIndicesCount(), type, 0);

And the vertex shader :

#version 330

layout(location = 0) in vec3 vertex;
layout(location = 1) in vec4 color;

out vec4 vertexColor;
uniform mat4 mvp;

void main() {
	gl_Position = mvp * vec4(vertex, 1.0);
	vertexColor = color;