Xorg VRAM leak because of Qt/OpenGL Application

mwestphal · July 1, 2018, 7:19pm

Hello board,

I am working on a complex Qt/OpenGL Application.
Xorg starts leaking in VRAM when i’m using the application and never release the memory, until I restart X of course.

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   46C    P8     4W /  N/A |     50MiB /  4040MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     29628      G   /usr/lib/xorg-server/Xorg                     47MiB |
+-----------------------------------------------------------------------------+
$ ./myOpenGLQtBasedApp ... doing graphic stuff then exiting
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   46C    P8     4W /  N/A |    110MiB /  4040MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     29628      G   /usr/lib/xorg-server/Xorg                    107MiB |
+-----------------------------------------------------------------------------+

The version of Xorg does not matter, tested a few.
The version of the driver does not matter, as long as it’s nvidia, tested 340, 384, 390
The linux distribution does not matter, tested Ubuntu 16.04, 18.04, fedora
The de does not matter, tested Unity, Gnome-shell, Xfce, Lxde + Compton, Openbox + compton
The compositor used does not matter, but the leak disappear without a compositor.
I did not test Wayland.

Do you know what could cause this behavior ?
Could this be due to OpenGL Sharing Context ?
If yes, where and how could it be implemented in our application or in Qt ? Could we force OpenGL not to share anything between processes ?
If not, what could, in our code create this behavior ?

Dark_Photon · July 2, 2018, 8:17am

[QUOTE=mwestphal;1291910]I am working on a complex Qt/OpenGL Application.
Xorg starts leaking in VRAM when i’m using the application and never release the memory, until I restart X of course.
…
The compositor used does not matter, but the leak disappear without a compositor.[/QUOTE]

That’s not too surprising. I was just about to suggest that you disable the compositor. Without that, X itself shouldn’t consume much VRAM.

Do you know what could cause this behavior ?

Compositors use the GPU for composite rendering, so you should expect more GPU memory consumption.

Does consumption grow every time you run your up or is there an upper bound?
What’s the max Xorg consumption you’ve seen when running with the compositor (and without)?
Have you ever run out of VRAM because of this?

It could be a leak, but it could also just be unused GPU memory pooled by the compositor or X via the NVidia driver (e.g. scratch texture memory).

For complete control of your GPU (performance, memory consumption, VSync, latency, etc.), just disable the compositor, and be glad that you have that option. On Windows, DWM (its compositor) has been the cause of needless GPU usage limitations since Vista and now can’t be disabled.

If you can’t or don’t want to just disable the compositor (and IFF this VRAM consumption has become a problem), I’d figure out how to determine 1) exactly what memory is being allocated on the GPU for what purpose and by whom, and 2) what configuration controls in the compositor and X allow you to reduce or at least bound that memory usage.

mwestphal · July 2, 2018, 8:50am

Does consumption grow every time you run your up or is there an upper bound?

There is no upper bound, Xorg VRAM usage keeps growing as long as I run, use my application and close it

What’s the max Xorg consumption you’ve seen when running with the compositor (and without)?

With the compositor, 96% of the VRAM
Without, Stays around 30% when running graphic intensicve stuff on my application

Have you ever run out of VRAM because of this?

Actually, once the VRAM reach 96%, Xorg starts to leak in RAM, once the RAM is full, OpenGL starts failing completelly

It could be a leak, but it could also just be unused GPU memory pooled by the compositor or X via the NVidia driver (e.g. scratch texture memory).

That was out first guess, but it appears that the memory is never released, even when running out of it.

For complete control of your GPU (performance, memory consumption, VSync, latency, etc.), just disable the compositor, and be glad that you have that option. On Windows, DWM (its compositor) has been the cause of needless GPU usage limitations since Vista and now can’t be disabled.

This is indeed a temporary solution.
Funny enough, on windows we dot not see this leak at all.
Of course we can’t ship the application this way and expect our users to disable their compositor

If you can’t or don’t want to just disable the compositor (and IFF this VRAM consumption has become a problem), I’d figure out how to determine

exactly what memory is being allocated on the GPU for what purpose and by whom, and

How can we do that ? Apitrace was not able to find any leak.
I would be happy to use any gpu profiling tools, but could not find any that could track memory allocation.

what configuration controls in the compositor and X allow you to reduce or at least bound that memory usage.

I tried that with xfwm compositor, disabling everything exept the compositor itself and it still leaked.
If you have a compositor to suggest that i could test with, that would be great.

Dark_Photon · July 2, 2018, 11:35am

[QUOTE=mwestphal;1291916]> Does consumption grow every time you run your up or is there an upper bound?

There is no upper bound, Xorg VRAM usage keeps growing as long as I run, use my application and close it

> What’s the max Xorg consumption you’ve seen when running with the compositor (and without)?

With the compositor, 96% of the VRAM
Without, Stays around 30% when running graphic intensicve stuff on my application
Actually, once the VRAM reach 96%, Xorg starts to leak in RAM, once the RAM is full, OpenGL starts failing completelly[/QUOTE]

And you have 4GB of GPU memory (GTX1050 Ti, presumably)? That’s some leak!

> For complete control of your GPU (performance, memory consumption, VSync, latency, etc.), just disable the compositor,
This is indeed a temporary solution.
Of course we can’t ship the application this way and expect our users to disable their compositor

Makes sense. Not many apps can assume they have complete control of the system configuration.

Funny enough, on windows we dot not see this leak at all.

That’s an interesting data point.

Question: You mentioned shared context and multiple processes. Does your application create multiple threads or processes? And just to make sure, you’re only talking about residual Xorg VRAM consumption after all of your threads/processes have been killed and are not running, right?

The fact that there aren’t a lot of others reporting this problem does tend to suggest that your app’s behavior might somehow be instigating this, or this is a compositor bug for a less commonly used window manager.

I would be happy to use any gpu profiling tools, but could not find any that could track memory allocation.

I haven’t gone looking for any GUI GPU Profiling tools. There is “nvidia-settings”. Click on the GPU 0 tab at the left and it’ll display your total and allocated GPU memory. It doesn’t show you how much GPU storage has been evicted from GPU memory back to CPU memory though (which is what happens when you run out of GPU memory).

If you want to see that, you can write a very short, simple GL program using NVX_gpu_memory_info. With this, you can query and log to the console how much GPU memory is still available (and how much GPU storage has been evicted back from GPU memory to CPU memory), emitting new consumption and evicted numbers every time they change.

Either way gives you gives you a simple tool to play around with your application and the window manager (move/resize/push/pop windows, etc.) to see what actions seem to be triggering causing the leakage.

I tried that with xfwm compositor, disabling everything exept the compositor itself and it still leaked.
If you have a compositor to suggest that i could test with, that would be great.

I would suggest using KDE, as it gets a lot of testing and use. GNOME is another common one.

Does this problem only happen with your app? If so, besides looking for X and window manager settings to configure their memory usage, another option to consider is to whittle down your app (disabling code) until the problem goes away. Then you’ll have a pretty good idea as to what your app is doing to instigate this problem.

mwestphal · July 2, 2018, 8:13pm

Question: You mentioned shared context and multiple processes. Does your application create multiple threads or processes? And just to make sure, you’re only talking about residual Xorg VRAM consumption after all of your threads/processes have been killed and are not running, right?

Our application run on a single thread. My question was more of a suggestion around the line of : is my OpenGL application sharing stuff with Xorg wich does not get released after ?

The fact that there aren’t a lot of others reporting this problem does tend to suggest that your app’s behavior might somehow be instigating this, or this is a compositor bug for a less commonly used window manager.

We reproduce with many desktop env and many compositors, including Kwin, so probably not.
However we do have a specific design on the Qt side, using QOpenGLWindow in a specific way that may not be used universally
So a bug (in Qt?) may not be impossible.

I haven’t gone looking for any GUI GPU Profiling tools. There is “nvidia-settings”. Click on the GPU 0 tab at the left and it’ll display your total and allocated GPU memory. It doesn’t show you how much GPU storage has been evicted from GPU memory back to CPU memory though (which is what happens when you run out of GPU memory).

Thanks to you, I know now why it stars leakings in the RAM after exhausting the VRAM, we still needs to figure out the initial issue.

If you want to see that, you can write a very short, simple GL program using NVX_gpu_memory_info. With this, you can query and log to the console how much GPU memory is still available (and how much GPU storage has been evicted back from GPU memory to CPU memory), emitting new consumption and evicted numbers every time they change.

So I’ve used one I found here : [kde-freebsd] kwin and GL_OUT_OF_MEMORY
It is great, It is way more precise than nvidia-smi.
However, regarding the evicted memory, it appears only when my VRAM is exhausted, so not ultra usefull.

Also I stumbled upon this : NVIDIA Working On A New OpenGL Memory Usage Extension - Phoronix
And this : https://developer.nvidia.com/designworks/nvidia-query-resource-for-opengl-usage
And this : GitHub - NVIDIA/nvidia-query-resource-opengl: A tool for querying OpenGL resource usage of applications using the NVIDIA OpenGL driver

So this could help, but it gives me, even for a simple example like glxgears:

Error: failed to query resource usage information for pid 30714

Could you test this on your side ?

Either way gives you gives you a simple tool to play around with your application and the window manager (move/resize/push/pop windows, etc.) to see what actions seem to be triggering causing the leakage.

Indeed, it allowed me to identify that the leak appears only if I close one of my QOpenGLWindow.

I would suggest using KDE, as it gets a lot of testing and use. GNOME is another common one.

As said before, it appears in every single de with a compositor.

Does this problem only happen with your app? If so, besides looking for X and window manager settings to configure their memory usage, another option to consider is to whittle down your app (disabling code) until the problem goes away. Then you’ll have a pretty good idea as to what your app is doing to instigate this problem.

Indeed, we have already doing a pass on this and where unsuccessful, we will try again !

Dark_Photon · July 10, 2018, 4:51pm

[QUOTE=mwestphal;1291920]Also I stumbled upon this : NVIDIA Working On A New OpenGL Memory Usage Extension - Phoronix
And this : https://developer.nvidia.com/designworks/nvidia-query-resource-for-opengl-usage
And this : GitHub - NVIDIA/nvidia-query-resource-opengl: A tool for querying OpenGL resource usage of applications using the NVIDIA OpenGL driver

So this could help, but it gives me, even for a simple example like glxgears:

Error: failed to query resource usage information for pid 30714

Could you test this on your side ?[/QUOTE]

I tested this on Linux and got the same error when I pointed nvidia-query-resource-opengl to a process running OpenGL, and force LD_PRELOADed their shared library into the OpenGL process’ image.

On Windows, I got:


Resource query not supported for 'nv_asm_ex02.exe' (pid 10020)

which is what I got on Linux before I LD_PRELOADED the shared lib. So it’s possible I wasn’t running it properly on Windows.

Dark_Photon · July 10, 2018, 6:14pm

I took a few minutes to dig deeper and see what wasn’t working properly on Linux.

The “nvidia-query-resource-opengl” process successfully executes the following, connecting to the client OpenGL process, sending/receiving NVQR_QUERY_CONNECT, and then sending NVQR_QUERY_MEMORY_INFO:


   nvqr_connect()                    
     create_client()
     open_server_connection()
     connect_to_server()
        write_server_command()       # NVQR_QUERY_CONNECT ->
        open_client_connection()
        read_server_response()
   nvqr_request_meminfo()
     write_server_command()          # NVQR_QUERY_MEMORY_INFO ->

However, it then fails to receive a valid response to the NVQR_QUERY_MEMORY_INFO in:


   nvqr_request_meminfo()
     read_server_response()

resulting in the “failed query resource usage information” error above.

On the OpenGL app side (server side), in process_client_commands() it properly responds to the NVQR_QUERY_CONNECT. But then when it receives the NVQR_QUERY_MEMORY_INFO request from the client, it calls:


  do_query()
    glXMakeCurrent   ( ctx )
    glQueryResourceNV( GL_QUERY_RESOURCE_TYPE_VIDMEM_ALLOC_NV, -1,
                       4096, data )  # 
    glXMakeCurrent   ( NULL )

The glQueryResourceNV() returns 0, which is failure. So it ends up sending back an empty reply buffer.

So what’s ultimately causing the failure is glQueryResourceNV() failing on the OpenGL app side.

Worth trying would be calling glQueryResourceNV() in a stand-alone OpenGL app w/o the client/server socket comms and w/o the 2nd GL context. I haven’t done that yet, but plan to.

mwestphal · July 12, 2018, 12:17am

The issue was resolved thanks to an intense debugging session.

This is a Qt issue, caused by our usage of QVTKOpenGLWindow and windowContainer.
The leak was caused by NULL parenting the parent of the windowContainer containing the QVTKOpenGLWindow just before deletion.

This code was here before when we used a QOpenGLWidget and it caused no issue. In any case, NULL parenting a widget before deletion is useless so removing the line resolve the issue.

This leak shouldn’t happen though, even in this situation, so I have opened a Qt issue to report it.

If you managed to fix the nvidia tool, let me know !

Edit : How do i tag this as solved ?

Dark_Photon · July 12, 2018, 4:29am

Glad to hear you found a solution.

No need. We don’t close threads or update the thread subject here.

system · October 19, 2021, 6:39pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.