Batch rendering slower than naive rendering

BANEBYTE · November 2, 2020, 2:38pm

Im having a problem with batch rendering performance.
I have grouped all geometry (200 models ~ 2000 polys) in one VBO and render it using single glDrawArrays call. The size of each vertex is 56 bytes. This gives me 158 fps.
But when i naively draw every model using glDrawArrays every time it gives me 165 fps!
What is the problem with my GPU? Maybe it is slower because of VBO size? Maybe 56 bytes for each vertex is very huge?
Thanks.

Alfonse_Reinheart · November 2, 2020, 2:40pm

Batch reduces the CPU overhead of rendering. 200 models and draw calls is a rounding error for most CPUs these days.

BANEBYTE · November 2, 2020, 2:44pm

Thanks for reply. Maybe you know how i can speed up my rendering? If i dont render a thing i get 400 fps. But when i do - 158. Why it takes ~300 fps to render 2000 polygons??

noizex · January 10, 2021, 8:29pm

I suggest changing your thinking about performance from FPS to ms/frame - that’s the change of mindset one needs to actually be able to profile how your rendering works. There are many articles that describe this problem, one of most popular one probably being: FPS Versus Frame Time

Your non-rendering anything is 2.5 ms/frame (which is pretty high for doing nothing, it should be way lower, so there is something happening there definitely even if you think there isn’t).

When you start rendering it goes up to ~6.3 ms/frame. So your rendering of 200 low-poly models (I assume 2000polys is total? or per model) takes additional 3.8 ms/frame to perform. Now that time can be CPU, GPU or both - you need to profile this to understand where your app spends most time. I’m a bit behind current apps for profiling that work with modern OpenGL, but you can also try to use things like Remotery (GitHub - Celtoys/Remotery: Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer) which is pretty easy to set up and allows you to profile both CPU and GPU side of things. Downside is you have to annotate your code with the profiling marks, but that allows you to put them where you think they’re giving you most information. Then you fire a web page that connects to websocket opened by your app and it will show real-time statistics of your frame, you can pause it, drill down etc.

We can’t really answer why it takes that time because it can be many things. Are you using static buffers or sending data for rendering to GPU each frame, and how does it have to sync it? How your usual frame looks, GPU calls wise. How your CPU side looks, maybe there are things that are slowing down your loop as it should be IMO much faster doing nothing… more like 0.3ms per frame from my usual experience of empty loops that just swap buffers.

BANEBYTE · February 15, 2021, 8:02pm

Thanks , man. How to setup Remotery?
I have included headers and cpp’s in my project. Included ‘ws2_32.lib’, checked for error while creating Remotery instance (Got no errors).
But it only displays me ‘Disconnected’.
Thanks man.