Generating data for OpenGL 2D rendering perfomance is slow but Monogame.GL's data generation/rendering methods are speedy. How? (CPU Perfomance?)

I am trying to make a custom tile map editor using OpenGL, C++ with GLFW(Version 3.3.8) library but I am having major issues with generation speeds but I can run games that have large tile maps such as Terraria and large worlds such as MineCraft. I have an NVidia GT 610 and Intel I3-3220 (2 cores) as GPU and CPU. I do not think the issue is GPU at all since once the map is loaded the frame rate is not choppy(in the game engines).

When I create a tile generator that covers a certain width and height in a game engine such as Unity or Godot it takes like 8 minutes to generate a 400x400 tile map using a basic nested for loops with draw(“set tile”) calls in it. This was also with Rule Tiles and Terrain Tiles respectively.

I decided to put the game engines to the side and learned the basics of OpenGL to make a tile map editor.The results were almost strikingly the same as the game engines though as it is taking me around 8 mins to produce a 400x400 tile map in my custom renderer as well. I thought that was a strange part because I thought there would be way less stuff involved on my part since I’m not even managing game object data at this point yet.

The much stranger thing comes into play when I researched into MonoGame and how they render 2D which is drawing using OpenGL or more specifically MonoGame.GL. I set up a sprite batch and drew a 1000x1000 map of 16x16 tiles (this tile size was used previously) instantly (less then 2 seconds). I looked into their code base and did not see anything out of the norm or vastly different from what I was doing in OpenGL.Looking further at their code base it seems they are even using more function calls per draw then me I think. So far MonoGame has rendered the most tiles at the fastest speed out of my custom renderer and the 2 game engines.

If you follow the draw call’s code definitions you can see the execution flow of the entire draw call and it is pretty loaded imo. There are while loops used for iteration and arrays used for collections. Then it is a OpenGL call in the end.

SpriteBatch.Draw(…) → FlushIfNeeded() → SpriteBatcher.DrawBatch() → FlushVertexArray() → GraphicsDevice.DrawUserIndexedPrimitives() → GraphicDevice.PlatformDrawUserIndexedPrimitives() ->GL.DrawElements(…)

The primary function calls for every sprite giving me the fastest speed.

So in OpenGL I set up my shader, VBO,VAO and EBO classes and was able to render the triangle so I moved towards tiles.

Here is the Main function when I declare the tilemap and generate it.

int main()
{

    glfwInit();

    TileMap tiles;
    TileMapRenderer tmRenderer;
    Camera camera(Vector2f(0.0f, 0.0f), 1.0f);
    int windowHeight;
    int windowWidth;
    Batch2D batch;

    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    GLFWwindow* window = glfwCreateWindow(800, 600, "Game Engine", NULL, NULL);;
    if (window == NULL)
    {
        std::cout << "Failed to create window" << std::endl;
        glfwTerminate();
        return -1;
    }
    glfwMakeContextCurrent(window);

    gladLoadGL();



    glfwGetWindowSize(window, &windowWidth, &windowHeight);
    glViewport(0, 0, 800, 800);

    tiles.height = 400;
    tiles.width = 400;
    tiles.tileSetTexWidth = GetTileSetTextureWidth("C:/Users/Michael/source/repos/GameEngine/Graphics/grassTilesTest.png");
    tiles.tileSetTexHeight = GetTileSetTextureHeight("C:/Users/Michael/source/repos/GameEngine/Graphics/grassTilesTest.png");
    tiles.tileSize = 16;

        //Start Generation

    auto start = std::chrono::system_clock().now();

    tiles.GenerateMap();

    auto end = std::chrono::system_clock().now();
    std::chrono::duration<double> elapsed = end - start;
    std::cout << "Time : " << elapsed.count() << " seconds" << std::endl;


    tiles.tileSetHandle = LoadTileSetTexture();
    tmRenderer.Initialize(tiles);   
    GLuint tex0Uni = glGetUniformLocation(tmRenderer.rendererShader.Id, "texture0");
    assert(tex0Uni != -1);
    glUniform1f(tex0Uni, 0);



    while (!glfwWindowShouldClose(window))
    {
        glClearColor(0.4f, .04f, .35f, 1.0f);
        glClear(GL_COLOR_BUFFER_BIT);

        if (glfwGetKey(window, GLFW_KEY_UP))
        {
            camera.position.y += .025f;
        }
        else if (glfwGetKey(window, GLFW_KEY_DOWN))
        {
            camera.position.y -= .025f;
        }
        if (glfwGetKey(window, GLFW_KEY_RIGHT))
        {
            camera.position.x += .025f;
        }
        else if (glfwGetKey(window, GLFW_KEY_LEFT))
        {
            camera.position.x -= .025f;
        }

                //Render map

        tmRenderer.Render(camera, windowWidth, windowHeight);

        glfwSwapBuffers(window);

        glfwPollEvents();

    }

    tmRenderer.Dispose();
    glfwDestroyWindow(window);
    glfwTerminate();

    return 0;
}

I have a breakpoint after the time stamp message because the tiles render just fine and I do not need help with that part but that GenerateMap() function takes around 5 seconds for 40x40 map and 408.106 seconds (8.1 minutes) for 400x400 (The same as Unity and Godot with less functions being called since autotiles are involved?)

Here is the GenerateMap() function:

void TileMap::GenerateMap()
{
    for (float x = 0; x < width; x++)
    {
        for (float y = 0; y < height; y++)
        {
            int tileId = GenerateTileId();
            int texX = tileId * tileSize;
            int texY = 0;

            while (true)
            {
                if (texX > tileSetTexWidth)
                {
                    texX -= tileSetTexWidth;
                    texY += tileSize;
                }
                else
                {
                    break;
                }
            }

                        //Arithmetic Computations
            float tX = (float)texX / tileSetTexWidth;
            float tY = (float)texY / tileSetTexHeight;
            float tXSpan = (float)tileSize / tileSetTexWidth;
            float tYSpan = (float)tileSize / tileSetTexHeight;
            float xPadding = (float)1 / tileSetTexWidth;
            float yPadding = (float)1 / tileSetTexHeight;

            std::cout << "X: " << x << std::endl;
            std::cout << "Y: " << y << std::endl;
            std::cout << "TexX: " << texX << std::endl;
            std::cout << "TexY: " << (float)texY << std::endl;
            std::cout << "TX: " << tX << std::endl;
            std::cout << "TY: " << (float)tY << std::endl;
            std::cout << "X Span: " << tXSpan << std::endl;
            std::cout << "Y Span: " << (float)tYSpan << std::endl;

            tiles.push_back(x/10);
            tiles.push_back(y/10);
            tiles.push_back(tX);
            tiles.push_back(tY);


            tiles.push_back((x + 1)/10);
            tiles.push_back(y/10);
            tiles.push_back(tX + tXSpan);
            tiles.push_back(tY);
        
            tiles.push_back(x/10);
            tiles.push_back((y + 1)/10);
            tiles.push_back(tX);
            tiles.push_back(tY + tYSpan);

            tiles.push_back((x + 1)/10);
            tiles.push_back((y + 1)/10);
            tiles.push_back(tX + tXSpan);
            tiles.push_back(tY + tYSpan);


            std::cout << "Generating... : " << tiles.size() << std::endl;
        }
    }
    
}

int TileMap::GenerateTileId()
{
       //Insert tileId generation code here eventually
    return 3;
}

I am using for loops to create/store VBO data transfered as a vector to upload to the VAO. Some notes: Removing the while loop only takes off 1.3 seconds at 40x40. Removing the vector pushback methods takes off 40 seconds at 400x400 (7 minutes instead of 8 yay) Removing the arithmetic computations with the vector pushback methods takes off 450 seconds at 400x400 (30 seconds instead of 8 minutes…still slower than Monogame 2 seconds @ 1000x1000) I use for loops while Monogame uses while loops (I do not think that is giving me 8 minute differences though) Monogame uses arrays instead of vectors (I don’t think collection assignement (std::vector.pushback() vs array[i] = val) is much different) Monogame uses pointers to arrays Monogame sprite drawing and basic for loop testing shows that I have no problem computing 1000x1000 item iterations on my CPU. I’ve tried setting up my project similar to their code base and even expanded my functions like them using custom batch classes with buffers but I got the same results (8 minutes). I have not tried an array yet or replacing the for loops with a while loop.

I guess I’m going to try using arrays next maybe but I do not think that is it.

I’m not sure what about my generation code is giving 8 minutes at 400x400 compared with their Monogame.GL draw code data generation giving less than 2 seconds with 1000x1000.

I want to be able to eventually have chunk loading and taking more than a few seconds to load chunks sounds laggy and annoying.

Any optimizations, suggestions and questions are more than welcome. Go in on it! Thank you.

What happens if you take out the nine std::cout statements in that loop?

What is tiles? If that’s a vector of some kind, what happens if you reserve appropriate size before doing a bunch of push_back operations on it?

@Alfonse_Reinhart That took off a few seconds. Tiles was a vector of type GLfloat. I tried the reserve method but it came out about the same. Not sure if I’m using that method right but I was calling it before I called the set of push_back methods.

Through the help of multiple suggestions and observations I fixed the issue. I replaced the vector with a pointer to an arrayand used array element assignment instead of the vector’s .push_back() method.(so it really was about the container type I was using) I also replaced the while loop with modulo operations so the function is running faster. I also moved the vertex info to a struct called TileInfo(might be useful later).

Here’s the new GenerateMap() function

void TileMap::GenerateMap()
{
	int i = 0;
	for (int x = 0; x < width; x++)
	{
		for (int y = 0; y < height; y++)
		{
			TileInfo f;
			f.position.x = x/10.0f;
			f.position.y = y/10.0f;
			f.tileId = 3;
			float texX = f.tileId * tileSize;
			float texY = 0;

		        texX = (f.tileId * tileSize) % tileSetTexWidth ;
                        texY = glm::floor(tileSetTexHeight / (f.tileId * tileSize)) ;

                        float yPadding = (float)5 / tileSetTexHeight;

			float tX = (float)texX / tileSetTexWidth;
			float tY = (float)texY / tileSetTexHeight;
			float tXSpan = (float)tileSize / tileSetTexWidth;
			float tYSpan = (float)tileSize / tileSetTexHeight;

			
			tiles[i] = f.position.x;
			tiles[i + 1] = f.position.y;
			tiles[i + 2] = tX;
			tiles[i + 3] = tY;

			tiles[i + 4] = f.position.x + (1.0f/10.f);
			tiles[i + 5] = f.position.y;
			tiles[i + 6] = tX + tXSpan;
			tiles[i + 7] = tY;

			tiles[i + 8] = f.position.x;
			tiles[i + 9] = f.position.y + (1.0f / 10.f);
			tiles[i + 10] = tX;
			tiles[i + 11] = tY + tYSpan;

			tiles[i + 12] = f.position.x + (1.0f / 10.f);
			tiles[i + 13] = f.position.y + (1.0f / 10.f);
			tiles[i + 14] = tX + tXSpan;
			tiles[i + 15] = tY + tYSpan;
	
				i = i + 16;
		}
	}
}

So basically turns out in the Monogame code base the use of arrays in the loops was faster than my use of vectors in the loops.

This function generates 1000x100016 tile data in under 1 second (~0.9111685s) as opposed to its previous form which generated 400x40016 tile data in ~8 minutes.

Update:

I tried using the reserve() method again and it worked. This time I used it outside of the for loop instead of calling it every iteration lol. This resulted in 1000x1000x16 generation in ~2.4667 seconds

using tiles.reserve(tileMapSize + vertexInfoSize); before the iteration loops while tiles is vector of type GLfloat.

Arrays still provided me faster results. :slight_smile:

Did you include the allocation of the array in the timing?

After doing that it was like a .01 difference when including the allocation of the array.