What’s your frame time (in msec)? And is that frame time consistent, or does it really depend on what you’re doing with the mouse?
(i.e. Disable VSync, and report the time (in msec) between successive frames, as measured just after the glfwSwapBuffers() call. So that the cost of printing the frame time to the console doesn’t disrupt your frame timings, consider only reporting it every 60th frame (or every Nth frame, with N >= 2).)
Ok, that’s a good start. I’d next suggest you set up a sub-frame timer or two to make sure you know where the time is being spent (e.g. make sure the time spent in getMousePosNDC() and processInput() is negligible, even when using the mouse).
A few thoughts for you to consider:
When you render larger rectangles, you’re consuming more fill (i.e. spending more time in the rasterizer running fragment shaders and doing framebuffer reads and writes). 120*120*250K touches 18X more pixels than 20*20*500K, so it’s possible you are fill limited. Up around 120x120 with 250K instances, if you see an approximately a linear change in frame time for a linear increase/decrease in the number of pixels touched, then it’s likely that you are.
If so, what can you do? Touch fewer pixels, or make touching pixels less expensive. Examples of the former would be reducing the size of your rectangles (in pixels). Examples of the latter would be to do things like disabling blending, alpha test, optimizing your fragment shaders (if they were expensive), adjusting your primitive rendeirng order to take better advantage of early Z and stencil testing, possibly disabling MSAA (if enabled), etc.
Now if you’re not fill limited, then you’ll need to look for your bottleneck(s) elsewhere. One example (as Alfonse hinted): While geometry instanced rendering is very convenient (and often fast enough), you may find that you can push triangles down the pipeline faster by using other methods, such as pseudo-instancing. That is, instead of just providing OpenGL with just the geometry for one rectangle and then re-transforming the instances in the shaders based on per-instance transforms fetched by the shader, go ahead and pre-transform the individual rectangle instances by those per-instance transforms up-front, generate the list of triangles, and then at draw time just render those pre-placed trianges directly using a normal non-instanced draw call (e.g. glDrawElements()).
Now as to that slowdown you indicated that you see when hovering over (or moving?) things with the mouse. You’re going to need to tell us what’s special about your processing (GL and otherwise) when this occurs to provide any useful input. Could be an inefficient GPU update method, but who knows?
I don’t understand your question. If you can draw a single quad and a bunch of instances, then that means you must have some way of getting per-instance data to the GPU, right? So every frame, you’re uploading new per-instance data, some values per-quad. Right?
So instead of writing per-instance data, you just write triangles to a buffer object. The triangles you would get if you applied the per-instance data to the quads. Then you render all of those triangles all at once.