Slow rendering upon first call

Michel-Heinz · April 11, 2024, 10:57pm

Dear all,
I am working on a visualization program for molecules. It is quite useful to index all the atoms in a molecule to reference them. Therefore, I started playing with a second shader code that draws numbers using a geometry shader. When I start my program and load a molecule everything works fine. Upon drawing the numbers with a call to glDrawArrays the code takes 30 seconds to draw the first frame. After that turning the numbers off and on again does not cause any noticeable delay. Even after restarting my computer using the code and displaying numbers remains very fast. The code is written in Python with pyside6 and by OpenGL.
The questions:
Why does this happen? I can only speculate that the code is somehow compiled just in time behind the scenes, although I compiled and linked it upon booting.
Is there any way of speeding up the first rendering? Either by precompiling or by some other means?
I have looked through old forum posts, but I did not find any satisfying answers.

Best regards

Dark_Photon · April 12, 2024, 12:16pm

30 seconds to draw a single frame with rasterization is insane.

You’re going to have to profile your Python code and see what it’s doing.
You’re the only one with the ability to collect the raw data needed to identify the actual bottleneck(s).
And the ability to inspect your code and see exactly what you’re doing.
Start with what “part” your code is taking 30 seconds, and narrow it down from there.
If you’re using Python inefficiently, it could very well just be the Python overhead.

Michel-Heinz · April 20, 2024, 6:31pm

Hi,
sorry for not answering for a week. I “profiled” the code using printouts in order to find out that the glDrawArrays call is the one that takes all the time. The shader code uses a vertex, geometry, and a fragment shader, and contains loads of if … else statements. When I fix the the variable that is checked in these statements and do not assign it the value it gets from layout (location=<some number>) the glDrawArrays call is very fast (some ten microseconds). IMO the compiler optimizes the if … else statements out of the code, which it cannot do If it does not know the value. But this still does not explain, at least for me, why this only take so long the first time it is called. I have 2 shader programs and as soon as I call the second one (the problematic) run it takes these 30ish seconds to draw the next frame and then runs at normal speed again.

Dark_Photon · April 20, 2024, 7:07pm

I assume you’re rendering on a GPU and here you’re talking about queuing time on the CPU.

If so, keep in mind that OpenGL drivers tend to defer operations relating to the GPU until it absolutely has to perform them. Often times, that’s when queuing a draw call on the CPU. At that point, if OpenGL hasn’t already gotten your shaders, textures, buffer objects, etc. used by the draw call created, optimized, and uploaded to the GPU, it’s at that point that the GL driver will do all that.

So just because it looks like “on the CPU” that the draw call is taking all of the time, it could very well be that behind-the-scenes, it’s performing some other processing needed to get ready to execute the draw call “on the GPU”. By contrast, the draw call execution “on the GPU” is most likely going to be very fast.

AFAIK, you’re correct here. GLSL shader compilers do aggressive constant folding and dead code elimination. So anything that can be evaluated at compile time and discarded as impossible-to-execute will be done before it’s further optimized, assembled down into GPU shader ISA, and uploaded to the GPU for execution.

Agreed. 30 seconds is absolutely ridiculous.

That’s a long enough time window you should be able to get a fairly good idea what the heck the driver is doing by running a CPU profiler tool while this “app CPU hang” is in progress. If on Windows, try:

Very Sleepy - This is insanely easy to run. You profile a whole process with profiling start/stop controls or capture for N seconds. Or you can even select a specific thread within a process to profile. It’ll quickly tell you where the heck your app is spending all of its time.

Alternatively, consider:

Intel VTune
NVIDIA Nsight Systems (if you’ve got an NVIDIA GPU)

BTW, what OS, GPU, and GL driver are you seeing these results on?

Also, is this a slow laptop PC or a fast desktop PC?

Yeah, insane. We gotta get to the bottom of that.

GLSL shader stage compile, link, and optimization might take 0.5 sec or so on a fast desktop for absolutely huge shaders with lots of compile+optimize work. But 30 sec is 60X that! I got no clue what you’re driver’s up to at this point (… unless you’re running on some ancient laptop with a super-slow CPU). But I’m very interested in finding out.

Michel-Heinz · April 21, 2024, 12:04pm

Hi thanks a lot for all the information. I am used to profiling python code with the cpython profiler, however I have no experience on profiling it when the GPU is involved, i.e. see what is happening on there. I am currently working on a MacBook pro with an M2 chip… Sooo, I ll do this on linux then with an amd graphics card and look for a profiler that gives me the info I need. As soon as I have all that, I ll come back here! Thanks you for you assistance!

Edit:
Hi again,
I cannot reproduce this behaviour on the linux machine. On the linux machine, the program takes longer to start as opposed to starting it on MacOs. But contrary to MacOs loading and rendering with second shader code functions instantly. I suspect MacOs compiles the code at some other time. So I would say that this is specific to my MacBook. I do not have an older MacBook with an Intel chip so I cannot check if it is a result of the M2 chip. Anyways apple does not want to support opengl anymore, so I do not know if it is sensible to further look into this. However, I ll look for a profiler now and check if I can get some more info.

Michel-Heinz · April 21, 2024, 1:06pm

The cprofiler give me the following insight:
The code takes 39 seconds in baseplatform.py:413(call) from site-packages/OpenGL/platforms. I cannot get more precise information, because the new profiling tools for MacOS do not support opengl anymore.
On the linux machine, the same use of the program yields different results: the code takes 11.8s seconds in the compile_shaders routine, which is IMO the expected behaviour.

Dark_Photon · April 22, 2024, 12:31am

Oh… Apple… Was the OpenGL implementation you are using written by Apple?

It sounds like you have 2 big handicaps here. First, you’re using an Apple OS and system. Apple decided 6 years ago to deprecate OpenGL and push their in-house Metal API instead. So if this ends up being an issue with the OpenGL implementation on the MacBook Pro, and Apple implemented it and maintains it. you may find them less than interested in doing anything about their problem.

Second, you’re not using the OpenGL API directly but are doing so indirectly via scripting language wrappers. This makes it more difficult to pin down the precise cause of bottlenecks. Not impossible. Just requires extra effort.

Ok, that makes sense. So something is “less than optimal” (causing a “30 second” hang) in the OpenGL driver support on your MacBook Pro box which doesn’t appear on a desktop Linux box with an AMD GPU.

Well, depends on your needs. If you can figure out how to workaround this apparent OpenGL driver bug in your app, that might be an option. But if not…

Another option (if you can’t just ditch the MacBook Pro altogether) is to run some other library or libraries that provide the OpenGL API through a different route.

Some people have become so fed up with the whole Apple deprecating OpenGL thing that they run 3rd party OpenGL support on their Apple Mac system. For instance, Zink on top of MoltenVK on top of Metal. Zink implements the OpenGL API and translates that to Vulkan. MoltenVK takes that Vulkan and translates it to Metal. And Metal is what Apple provides natively and wants everyone to use instead of OpenGL. For some apps, this works pretty well and does a total end-run around Apples old, buggy, deprecated OpenGL driver support. And you’re actually going to get support for this path. Whereas you’re unlikely to get any support for the deprecated OpenGL support from Apple that’s left lying around on a Mac box.

Dark_Photon · April 22, 2024, 12:58am

11.8 seconds in compile_shaders on the Linux box? And that’s expected?

How many GLSL shader stages is this compiling?
How many programs?
How many lines in your shader programs?
How many statements?

Unless you’re compiling a bunch of shaders, even 11.8 seconds seems egregious. It’d be worth taking a close look at your shader code to see if you could rework it to bring the compile times down. Possibly significantly!

The fact that your Linux compile times on a different GPU with a different GL driver are still way up in the same order of magnitude as the Apple MacOS MacBook Pro box suggests that your shaders may partly be the culprit here with this poor performance.

This could be a case where the MacBook Pro’s OpenGL is just waiting to do the shader compile+link at draw call time. In which case you can force that time to startup like on the Linux box by pre-rendering with the shader at startup immediately after doing the compile. That way at runtime (after startup), your shaders are fully compiled, optimized, and in a ready-to-draw state, regardless of when/how the underlying OpenGL driver otherwise decides to perform this prep work.