Which gl-commands alloc/use memory that needs to be freed

marvin42 · January 29, 2020, 10:51am

I was looking for a documentation or summary that tells me which GL-commands allocate any memory or resources that need to be freed later - in order to avoid memory issues.

I’m using lists, for which I call glDeleteLists properly, also use once wglCreateContext (on create) and wglDeleteContext (onDestroy).

Background: I have a CAD-like application (VC++/MFC) which (seems) to work just fine. While the app runs, the user often reloads the project, which cleans up everything and then starts over with a new 3D scene (typically some 100.000 triangles/vertices - no textures - pretty basic light - no shadows). After a while (can be hours/days), on such a reload the application says “there is no more memory” - which comes from the framework and seems to happen when the app tries to add GL-objects - I was not able to debug the exact code line so far. As far as I can say, there are no memory leaks. When restarting the whole app, the message comes now immediately on loading the project.
It helps when the user logs off (Win10) - with a new session the behaviour is ok, until it then fails again after some hours/days.

thanks

Dark_Photon · January 29, 2020, 2:54pm

Which GPU driver and GPU? Or have you noticed a correlation between GPU vendor and/or GPU and this problem?

A few suggestions:

First, add overall GPU memory allocation reporting to your app, to tell you 1) how much GPU memory is available, and 2) how much of it is currently allocated to “some” application (not necessarily yours). For instance, see NVX_gpu_memory_info (NVidia) and ATI_meminfo (AMD). This is a very useful double-check on any other allocated memory instrumentation you might add. It also can help reveal if your per-GL object statistics (see #2) are missing anything, as well as whether (on user machines) some “other” application besides yours is gobbling up a lot of GPU memory, leaving your app with too little to function sometimes.
Add per-GL object memory allocation statistics (or estimates). In the absence of driver help, you can form a pretty good rough guess for things like textures and buffer objects. But with things like the ol’ display lists that you’re using, only the GL driver writers know because there’s a lot of driver voodoo going on under-the-hood with those (repacking the often immediate mode mess the app is feeding to the driver into blocks, probably allocating GPU memory to back vertex attr/index buffer object blocks to hold the repacked vertices, etc.). GPU memory usage is completely abstracted from the user with ol’ GL display lists, which is potential problem with using them.
If the GL driver supports it, query more detailed GPU memory usage info, such as the memory blocks allocated on the GPU and type of objects stored in that memory block. For instance, NV_query_resource / NV_query_reseource_tag (haven’t actually used these).

As far as what objects allocate GPU memory, you can generally assume textures will. Buffer objects can, but depending on usage, the driver may allocate storage on the GPU or may allocate storage in pinned CPU memory for them (or dynamically convert one of these to the other based on your usage). Display lists very likely will, but if/how is completely abstracted from the dev via the legacy GL API. Shaders/programs will, but the space should be relatively small compared to textures and buffer objects. In general, look at anything that you think the GPU will need to access to render your draw calls, and (in the absence of better info) just assume that it will be consuming GPU memory.

Going forward, I’d suggest phasing out your use of legacy GL features like display lists where it’s very unclear what exactly is going on under-the-hood (making GPU memory estimation problematic). Also consider moving toward a model where you avoid dynamically allocating any GPU memory for GL objects after startup, unless you’re explicitly placing objects in storage preallocated at startup. Then you’re less at the mercy of what other applications are doing, GPU memory fragmentation, etcetc. and are less likely to hit this kind of error at runtime.

marvin42 · January 31, 2020, 12:15pm

Hi Dark_Photon, thanks for these thoughts.

We tested the software on some systems (Win7, Win10) both with Intel 530 Graphics and OpenGL 4.4 and 4.6. The error is reproduceable on Win10 with OpenGL 4.6 - I haven’t seen it on my Win7 with OpenGL 4.4 and also not on a Win10 with Intel 4000 and OpenGl 4.0.

Would you mind to explain why you think that display lists are problematic ? - I mean, what’s the difference between drawing things right away compared to embrace them within glNewList/glEndList. I thought this is one of the most basic functions and is supported by all OpenGL implementations.

I implemented a HeapWalker to check the memory allocation of my app, but this shows no memory issues, after cleaning up, the Heap is just fine. Frequent loading/cleanup indicates no inccreasing memory - but Windows’ task manager shows approx 4-5 MB on each load/cleanup cycle. I suspect this happens insice OpenGL…

I’m not using any textures, just GL_TRIANGLES and GL_LINES and similar things, nothing fancy at all. Our ‘reference project’ we use for testing has a total of approx 4 million triangles, the screen update rate is not smooth, but still acceptable.

Since this software is based on ‘projects’ I can’t live without dynamic objects.

Do you know how to retrieve informations (C++) of whats going on inside GPU ?

thanks

Dark_Photon · February 1, 2020, 12:51am

Ok. Which GPUs and GPU drivers can you reproduce this on?

And no virtual machines are involved here, right? Your native app is running directly on the Win10 system.

The abstraction that display lists present isn’t how modern GPUs work. Between the GL commands you inject between glNewList() and glEndList() and the work that gets executed when you call glCallList() is a lot of GL driver CPU-side voodoo. You have no control over how long this voodoo (display list compilation) will take, what form it will have on the GPU, or the resulting performance of the compiled display list. Moreover, display list functionality is very old (now ancient) and not necessary for modern OpenGL applications. Consequently, this driver functionality may be more likely to develop regressions (that remain regressions for a while) due to lack of use by most apps. They’re just not necessary anymore to get excellent performance from GPUs.

Not so much basic, but ancient, and deprecated. Some GL driver implementations don’t support the compatibility profile, and so won’t support display lists. In other drivers written by vendors that don’t do as much testing, I wouldn’t expect the implementation of display lists to be flawless.

Dark_Photon · February 2, 2020, 6:31am

Sorry, I think I misunderstood the implication of what you’re saying. Are you saying that you have reproduced this (only) on:

Intel 530 Graphics on Win10 with OpenGL 4.6

but not with:

Intel 530 Graphics on Win10 with OpenGL 4.4
Intel 530 Graphics on Win7 with OpenGL 4.4 or 4.6

So you’ve only reproduced this on one GPU+OS+OpenGL version combination? Which graphics driver version? Have you tried other graphics driver versions? Have you tried other GPUs (e.g. Nvidia and AMD)?

Also…

What specific error is being detected by the framework when it reports this? Could it be for a failed CPU malloc() call? Is it for a GL call? etc.

If you’re absolutely sure that your app has been killed before restarting it (I’d re-verify this in Task Manager), then the resources your app was using (e.g. CPU and GPU memory) should have been released. This tends to suggest some software that your app is making use of isn’t cleaning up properly (possibly the graphics driver, but there’s insufficient evidence in your posts above to be even roughly sure).

Look at an aggregate GPU memory available statistic before the first run of your app and after the problem starts occurring (and after you’ve killed your app). If it’s a graphics driver GPU mem leak bug, then you should see very different results.

Could be due to resetting the graphics driver, or could be due to killing restarting some other software your app is making use of.

marvin42 · February 4, 2020, 3:00pm

Hello Dark_Photon,

thanks for all the explanations, I think we found our problem: many of our graphical objects are simply based on triangles/facets (STL Files) and are stored in a std::vector - one of the data processing methods failed to allocate memory when assigning a whole vector (~1.6 mio facets) using the == operator. Due to a clumsy try/catch scenario, this situation seemed to relate to a OpenGL memory issue. So yes, it happens on multiple PCs, not only on one particular.

The problem usually happens when “reloading” the project, which involves a complete cleanup and then rebuilding the whole scene from files. It’s interesting that this problem hardly occurs when I do things step by step - but usually happens when I try to reload multiple times in a row.

I found so far no way to figure out the memory usage of the GPU (do you know how to get a GPU memory report from C++ environment ?) - but my internal heap management showed no errors.

When the project is completely loaded, Windows reports a memory usage of 780 MB, my heap usage is 275 MB - I guess the gap is GPU memory ? When cleaning up, it seems as if Windows does not free all memory at once, maybe there is a heap defragmentation or something like this and I guess thats why, when I’m quickly reloading the project, the systems fails to allocate this one large block of memory (my best quess)

So I optimized my data handling and removed one temporary vector/container, instead working on the data in-situ. Its faster now and uses less memory.

I also appreciate your input regarding display lists : so you recommend to simply draw everything in the OnPaint/OnDraw context when necessary, instead of “preparing” the objects using display lists.

GClements · February 4, 2020, 5:15pm

Display lists were designed for a specific purpose: to avoid transferring the same data from the client to the X server each frame (the two were often on separate systems, communicating via Ethernet). glNewList/glEndList essentially instruct the X server record the GLX protocol data while glCallList replays the stored data.

If that isn’t a likely usage scenario for your application, there isn’t really much point in using display lists.

Dark_Photon · February 6, 2020, 9:15pm

No. Not in the sense that I think you mean, at least.

Display lists come from a day when there wasn’t support for:

pre-uploading geometry to the GPU that you planned to render with in the future (in order to save re-uploading it again and again),
very efficient methods to stream data (including geometry data) to the GPU at draw time, and
methods for issuing draw calls for a ton of geometry (meshes) using very few GL commands at draw time.

Now we have all of this. So there’s that much less reason to depend on them.

For static geometry, consider preuploading your geometry data to VBOs. Then at draw time, just tell the GPU to pull the vertex/index data for your draw calls from that pre-populated GPU-accessible memory.

For dynamic geometry, stream the data efficiently to the GPU at draw time (see Buffer Object Streaming in the wiki) and tell the GPU to pull the vertex/index data for your draw calls from that dynamically-populated, GPU-accessible VBO memory.

And finally, for batching lots of small mesh draw calls together, read up on the various types of draw calls (see Vertex Rendering), including Draw Indirect and MultiDrawIndirect draw calls.

marvin42 · July 13, 2020, 1:44pm

Hello again,

I need to get back to this discussion…
I recently had some time to make some tests regarding the performance of my OpenGL drawing routines.

My application A is based on a prepare/glCallList mechanism, application B is simply drawing the same stuff whenever an OnPaint/Draw happens.
We loaded some 3D objects into the scene with some 150.000 facets and simply used the mouse to rotate and pan the scene.

Version A (with disdplay Lists = prepare/glCallLists) is remarkably smoother and faster and uses, according to Windows task manager, approx 10% less CPU time. VersionB is somehow rough and needs, 10% more CPU. A needs approx 18 MB heap while B uses only 9 MB heap.
Since performance is more important than memory, I guess I’ll stick with the old prepare/glCallLists.

I wonder whether this is something anyone can confirm or comment ?
thanks

RigidBody · July 13, 2020, 2:24pm

depends on exactly what you are drawing and how you draw it.
example:

glBegin(GL_LINE_LOOP);
for(double phi = 0.0; phi < 2.0*M_PI; phi += M_PI/100.0) {
    glVertex2f(sin(phi), cos(phi));
}
glEnd();

draws a circle. if you draw it in direct mode, the sinus/cosinus terms are calculated each time the circle is redrawn. if you use a display list, those terms are calculated only once, at the time when the list is compiled.

marvin42 · July 13, 2020, 2:30pm

@RigidBody : … yes I agree and I’m aware of this.
In my case the 3D objects are always built when loaded from files, so either the DisplayList or the Draw routine just loop across a std::vector of ‘faces’ (=triangles). Most of my 3D objects are part of a kinematic simulation so the only thing which is dynamic is the position/orientation in space, and this relies on glTranslate/glRotate. The objects themselves are more or less static.

GClements · July 13, 2020, 3:48pm

Not without telling us what the non-display-list version is actually doing. The difference between the two should be a matter of replacing glCallList with glBindVertexArray and glDrawElements. If you’re doing more than that, whatever else you’re doing would probably explain (most of) the performance difference.

But are you storing that data in CPU or GPU memory? IOW, are you calling glDrawElements with a buffer bound to GL_ELEMENT_ARRAY_BUFFER or are you sourcing the data from client memory? The display list version is likely storing everything in GPU memory, so the glDrawElements version needs to do likewise to get similar performance.

marvin42 · July 14, 2020, 9:59am

All my data is in CPU memory, almost all drawing relies on glBegin(GL_TRIANGLES).

well, at this point I have to drop the drawers: this software needs to run in an industrial environment, which means I need to support even WinXP and Win7, most of it was written based on VC6 and so I still use an OpenGL 1.1 API - although most PCs provide OpenGL 4.x environment. We’re currently not able to migrate the whole monster into a more modern VC environment. Please understand that industrial machinery has life cycles of 20-25 years…

Yes, I understand and accept that more ‘modern’ approaches might be better/faster or whatever - but that ‘old stuff’ is not bad at all, the software is very stable and reliable and I just like to understand what I can do to get best performance with given preconditions.