Problem with performance. Very fast under NSight

Hello to the group,

I am having a very strange problem with an OpenGL application that I am developing. I have a simple scene with 5 3D Objects, a terrain of 16000 vertices, water (using 2 draws for reflection and refraction) and an instanced 3D object with 1000 instances and a few post processing effects. When I run the application under the debugger (and when I run the executable outside the debugger) I get an average of 30fps. So I decided to use NSight to try to find out where my bottlenecks are. So I installed NSight and run the application and under NSight I get an average frame rate of 200 fps… I have absolutely no idea why this happens. Can anyone help me?

Nsight VSE or Nsight Systems?

And are you intending to run with VSync ON or OFF in both cases?

It sounds like you’re running VSync ON normally but VSync OFF under NSight . Try forcing VSync OFF in NVIDIA Settings and recheck your perf outside of Nsight.

One other thought. At least under Nsight Systems, your app is run as an elevated user. It’s not likely, but perhaps your app is doing something that runs significantly faster for an elevated user?

Thank you for the reply. I used Nsight Compute Interactive profile. On both cases it runs with VSync off. Under debug I am getting from 20 - 50 fps and under Nsight I get 180-220fps. Do you mean to run it as admin? I tried that but the result is the same. I also noticed that the shader of the terrain is failing to compile under NSight, so I switched off the terrain but I have the same results. Could this be some setting with GLFW?

Anything’s possible. You’ll just have to nail it down.

Also, why are you testing performance in a Debug build? And are you running the same build for both? Not knowing what inefficiencies are switched on in your debug build, I’d suggest only profiling with a Release build.

As far as figuring out where all of our time is going with those 30 fps runs, I’d suggest doing a:

  1. quick profiling run in VerySleepy and (if no obvious culprits there) then
  2. a run in Nsight Systems

both with VSync OFF. One of those should reveal pretty quickly where your 30 fps app is spending most of its time, and whether you’ve primarily got a CPU-side or a GPU-side bottleneck.

If you’re spending most of your time in Present, Clear, or Blit, then you probably do not have VSync OFF like you think you do.

I just ran the visual studio performance profiler and it shows that all the time is consumed in the drawing commands, the CPU is almost idle for the most time and average of 30fps. Then I run Nsight Systems profiler with both the debug and release with average of 200 fps… And I am sure VSync is off. Is there something being set to OpenGL by Nsight that has that huge difference?

That’s pretty strange. You’re just going to have to poke around and figure out what your app is doing to trigger this.

  • What OS is this?
  • You’re running your executable natively on the local machine?
  • With NVIDIA graphics drivers installed? What version?
  • Are you sure you’re connecting to the NVIDIA OpenGL driver in both cases (not Noveau, Mesa3D, Microsoft software GL, etc.)?
  • No virtual machines are involved?
  • With no VNC, RDP, SSH, or any other network in the display path between the app and your display?
  • Are you checking for GL errors?

Since you’re running on NVIDIA GL drivers (I assume), then plug in a GL message callback:

The NVIDIA driver is very helpful in providing detailed info on errors, perf warnings, or just general info as to what it is doing inside the driver that you may care about. There may be clues in this that will help you nail this problem down.

Also, query and print out the GL_VERSION string after you have a context.

Finally, when you disable VSync with GLFW, are you doing this?

Thanks for the long answer.

  1. OS is Windows 10
  2. It is running on the local machine
  3. Drivers: 460.89 Date 15/12/2020
  4. I don’t know how to check this.
  5. No virtual machines.
  6. No networks
  7. I just added a debug callback and I am getting this error :

[ERROR]: API_ID_REDUNDANT_RBO performance warning has been generated. Redundant state change in glBindRenderbuffer API call, RBO 1, “”, already bound.
Source: API
Type: Performance
Severity: low

this is how I typically create a frame buffer in case something is wrong:

// Create a framebuffer for drawing all scene to a texture
glGenFramebuffers(1, &renderTextureFrameBuffer);
glBindFramebuffer(GL_FRAMEBUFFER, renderTextureFrameBuffer);

// Off-screen texture rendering
// generate texture
glGenTextures(3, texColorBuffers);
for (int i = 0; i < 3; ++i)
{
glBindTexture(GL_TEXTURE_2D, texColorBuffers[i]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, renderWidth, renderHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
// attach it to currently bound framebuffer object
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, texColorBuffers[i], 0);
}

// generate the render buffer as a depth and stencil buffer
unsigned int rbo;
glGenRenderbuffers(1, &rbo);
glBindRenderbuffer(GL_RENDERBUFFER, rbo);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH24_STENCIL8, renderWidth, renderHeight);
glBindRenderbuffer(GL_RENDERBUFFER, 0);
// attach the render buffer to the frame buffer
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_STENCIL_ATTACHMENT, GL_RENDERBUFFER, rbo);

GLuint ScreeTextAttachments[3] = { GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1, GL_COLOR_ATTACHMENT2 };
glDrawBuffers(3, ScreeTextAttachments);

if (glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE)
Log::error(“ERROR::FRAMEBUFFER:: Off screen render Framebuffer is not complete!”);

The OpenGL version printed is:

[INFO]: GL Vendor : Intel
[INFO]: GL Renderer : Intel(R) UHD Graphics 630
[INFO]: GL Version (string) : 4.6.0 - Build 26.20.100.7870
[INFO]: GL Version (integer) : 4.6
[INFO]: GLSL Version : 4.60 - Build 26.20.100.7870

Oops I didn’t notice that it is using the Intel graphics card instead of the NVidia 1650 (I work on a laptop), so I need to fix the errors with the framebuffer and choose the NVidia card.
Thanks You helped me nail down the problem. Any thoughts?

I fixed the problem using the following command for anyone interested :

extern “C”
{
__declspec(dllexport) unsigned long NvOptimusEnablement = 0x00000001;
__declspec(dllexport) int AmdPowerXpressRequestHighPerformance = 1;
}

This instructs to use the dedicated video card.
One error that I am getting on the debug callback now is the following:

[ERROR]: Buffer info:
Total VBO memory usage in the system:
memType: SYSHEAP, 0 bytes Allocated, numAllocations: 0.
memType: VID, 1.01 Mb Allocated, numAllocations: 22.
memType: DMA_CACHED, 0 bytes Allocated, numAllocations: 0.
memType: MALLOC, 144.21 Kb Allocated, numAllocations: 20.
memType: PAGED_AND_MAPPED, 0 bytes Allocated, numAllocations: 0.
memType: PAGED, 890.02 Kb Allocated, numAllocations: 2.

Source: API
Type: Other
Severity: low

Anyone has any idea what this is?

Great! Yes, you can tell Win10 in its settings to always select the “High Performance GPU”, which will cause it to select the NVIDIA GPU.

This isn’t an error. Notice that the Type was not ERROR and that the Severity was LOW.

This is just some general info from the driver telling you in what memory space it’s placed various VBO memory allocations, and the total allocated bytes in each space. You can just ignore it.

Thanks man, your guidance nailed this problem. Now I have more tools on my disposal for future problems :slight_smile: