1k triangles at 60fps - acceptable?

red1939 · August 23, 2013, 2:29am

Hello everyone!

Yesterday I’ve played around a little bit with OpenGL and one thing astounded me: I can only draw 1k triangles before I reach 60fps border. That’s only 60k triangles per second, not much in my book.

Is this acceptable/expected performance and I should strive to reduce the number of vertices that are processed? Or I am doing something terribly wrong with the API?

Details:

Vertices in buffer: 3000
Vertices type: 3 x GL_FLOAT
Draw method: glDrawArrays (no indexing)
Buffer usage type: GL_STATIC_DRAW (not updated every frame, obviously)
Shaders: simplistic (pass through + constant color)
Textures: no
Multisampling: no
Blending: no
Depth: no
Explicit synchronization: no
GPU Perf Monitor CPU time: 0.2%
GPU Perf Monitor GPU time: 99%
GPU Perf Monitor GPU utilization: low!
Windowed: yes (640x480)
Machine: Radeon HD 4850 (512 MB), AMD Phenom X2, 6GB
OpenGL version: Core OpenGL 3.3

thokra · August 23, 2013, 2:56am

It depends on what else you’re doing. There is absolutely no problem dropping the performance of your program way below 60 fps with only a screen-aligned quad which consists of only two triangles.

However, given your hardware setup and your obviously trivial shader code and very low resolution 60k seems ridiculously low. Are you sure you are above 60 fps at any time and not capped by vsync the entire time?

red1939 · August 23, 2013, 4:09am

The problem is that I am not doing anything at all: no application logic, empty (trivial) shaders, no multisampling, etc.

It’s not capped at 60 fps (I believe), as when I decrease the number of triangles, the fps goes higher.

I am using statically linked glfw v.3.0 and glew 1.1.0, together - of course - with opengl32.lib.

The only hint I could get from GPU Perf is that the almost all of the GPU time is taken by so called “Interpolator”. Sure, the triangles are huge as a whole window, but it doesn’t explain such low performance.

thokra · August 23, 2013, 5:01am

Actually, the visible area covered by primitives does matter a great deal. The more fragments are generated, the more interpolation has to take place. With a growing number of exports (i.e. the stuff you pass out of one stage into another, e.g. vertex shader -> fragment shader) this overhead increases - not to mention, there are values that are always interpolated, such as the depth value. Plus, the more fragments, the more fragment shader invocations. (Probably not a problem in your case though).

Just as a comparison: If I’m not mistaken, a few years back I was able to yank around 3M vertices through a GeForce 8600M GS at approx. 30fps - which, even at the time, was pretty crappy hardware. Also with very simple shaders. Can post some code? Shaders, state inits, rendering loop? Do you actually have the depth test disabled?

Have you tried some OpenGL based game to see if you get bad performance there?

red1939 · August 23, 2013, 2:52pm

I’ve tried with the depth test disable/enabled, but I don’t see any real difference in terms of performance impact, also I didn’t seem to have too much problems with OpenGL titles.

red1939 · August 23, 2013, 2:58pm

As the lovely anti-link prevention system blocks me from sending urls, you will have to do manually create pastebin links from these:
pastebin.com/rUHYYmBd main.cpp
pastebin.com/yzZJ19xx Context.hpp
Context.cpp - Pastebin.com Context.cpp
vs.glsl + ps.glsl - Pastebin.com Shaders

GClements · August 24, 2013, 3:42am

Oh, it does. 640x480 x 500 quads x 60 fps = 9.2e9 pixels/second.

That’s 28 Gbyte/sec at 3 bytes/pixel or 37 Gbyte/sec at 4 bytes/pixel; depending upon the width of the memory bus and the clock rate that could realistically be saturating the memory bandwidth. That could explain why the “GPU utilization” says “low”. Early-depth optimisation won’t help in this case, as you’d just be replacing writes to the colour buffer with reads from the depth buffer.

If you’re comparing the triangle counts against a game, games aren’t drawing a thousand 640x480 triangles per frame.

red1939 · August 25, 2013, 10:39pm

I see, so in other words, a more realistic scenario would be to draw these triangles in some distance, or at least smaller?

Alfonse_Reinheart · August 25, 2013, 11:13pm

Well, the bigger question is why you’re drawing so many triangles, all of which are overlapping, that close to the screen? Or more to the point: what are you drawing?

GClements · August 26, 2013, 3:13am

Yes. A more realistic test would keep the overdraw (the average number of times any given pixel is drawn) in single figures.

Real programs make some effort to ignore parts which can’t be seen, so the total fill rate is limited to some low multiple of the number of screen pixels.

red1939 · August 26, 2013, 8:24am

I am just trying to prepare a simple framework (i.e. base-point) for further performance analysis of various shader code, GL settings, draw methods, etc.

Roaoul · August 29, 2013, 11:33pm

Yes. Small triangles will be vertex-limited. But large triangles will be fragment-limited. One triangle has 3 vertices, but up to 1 million fragments, each of which takes separate attention from the gpu. Try making tiny triangles, and see if the speed shoots up.

thokra · August 30, 2013, 4:01am

Your application will be limited by vertex processing if the overhead of vertex processing exceeds any other form of processing done by the GL or the application. Saying “small triangles will be vertex-limited” is kind of nonsense. And even if your triangles are small, the likelihood of becoming limited by vertex processing for a few hundred triangles is still very low unless your application is very, very trivial and your fragment shaders do absolutely nothing than export a constant color. You cannot be sure of anything unless you get hard numbers, especially when stuff seems to be trivial.

Why up to 1 million? What if the triangle is large enough that it simply cover the whole screen after clipping? Does a full-HD fragment buffer only consist of 1 mio pixels? No.

Roaoul · August 30, 2013, 2:16pm

[QUOTE=thokra;1254477]…
Why up to 1 million? What if the triangle is large enough that it simply cover the whole screen after clipping? Does a full-HD fragment buffer only consist of 1 mio pixels? No.[/QUOTE]

Oh. Newbie question: What is the maximum number of pixels a triangle could put to a “full-HD fragment buffer”? -thanks

thokra · August 30, 2013, 6:00pm

A full-HD buffer usually has a resolution of 1920 x 1080 pixels. That’s a little higher than a million.

Rakehell · September 1, 2013, 12:05pm

Are you using 1000 draw calls?

red1939 · September 5, 2013, 11:27am

Nope. 1000 x 3 triangles via one DrawArrays.