Nvidia gtx 1080 memory leak

jaketehsnake · April 29, 2019, 11:03am

I have an unfortunate problem.

I’ve recently started distributing my game built in java + opengl and it works fine on the handful of graphics vendors I’ve tested it on. Only the ones that are on a nvidia gtx 1080 are experiencing problems. And the problem is an out of memory from polling glerrors. I don’t know where the memory leak is, I know where it’s caught, but that doesn’t help much, since I don’t think it’s close to where the memory leak is located. I think it’s in the rendering loop, building up until I do some initializing and error checking.

The code base is very big, so I can’t show it all and don’t know which parts to show you. The game uses about 150 MB of video ram on my computer and this usage is persistent throughout the execution. It’s the same on all the cards I’ve tried it on. There’s nothing fancy in the code. I use deferred rendering, two 4096*4096 textures, an FBO and a couple of < 1MB VBO 's that do streamed drawing.

I’m a bit desperate as I don’t have access to a 1080 to debug on and I would prefer not buying one since they are like 1000$.

So, does this sound familiar to anyone, or would anyone with a card be willing to help me out?

The game is downloadable here: https://www.indiedb.com/games/songs-of-syx/downloads/songs-of-syx-demo-v1

I’ll share any code on demand.

Dark_Photon · April 29, 2019, 12:35pm

There’s not much to go on here. And you didn’t really clarify whether this was a CPU or GPU/GL driver out-of-memory condition. Since you mention “from polling glerrors”, and cite your observed GPU memory usage (150MB), I’m guessing you mean that you are calling glGetError() and at some point receiving a GL_OUT_OF_MEMORY error back from the driver. Is this correct? In any case, post the error message you’re getting with a sketch of what your app was doing in the frame before that.

What GPU and driver do you have? Have you tried it on any NVidia GPUs? Any NVidia GeForce GTX10xx GPUs (e.g. GTX1050)? You probably don’t need a GTX1080 to repro this.

Have you suggested that the user update their NVidia graphics driver to the latest available here: NVidia Driver Download Have you gotten the user to tell you which driver version they’re running (GL_RENDERER, GL_VERSION).

They’re more like ~$700-750 right now. But still, not cheap. My guess is you don’t need “that” GPU to repro the issue. If you haven’t tested on an NVidia GPU, I think it’s an interaction between your GL usage and the NVidia GL driver in general.

My guess is that you are either doing something with Buffer Object Streaming and/or render target usage that is reacting badly with the NVidia driver (see below). Alternatively, you’ve got some GL object creation/destruction going on every frame that you don’t know about.

If you have access to a game user with this problem, you could have them turn on/off app subfeatures and look for correlation with the out-of-memory condition.

A few things you can do to help get a line on what’s causing this problem:

Add NVX_gpu_memory_info support and an option to enable logging GPU memory usage reports from it. This will tell you at any given time, how much GPU memory is available, out of how much, and whether or not the GL driver has had to evict memory “off of” the GPU back into CPU memory because you’re overrunning GPU memory. If evicted ever grows while your app is running, you’re overrunning GPU memory and need to fix it.
Add support for logging the GL debug output message passed back to your app from the NVidia graphics driver. These are “extremely” helpful in determining what’s going on w.r.t. GPU/driver memory allocations and movement of objects between GPU and CPU memory. For the short, short version, see OpenGL Error#Catching errors (the easy way). For reference, the source extensions this came from are here: KHR_debug and even earlier ARB_debug_output. Initially, log every message (even informational), not just errors and wranings. Using this technique also allows you to catch GL errors without having to obfuscate your code by calling glGetError() everywhere (like you used to long ago in early OpenGL days). In the callback, you can dump a stack trace and see exactly where this was triggered the first time (or trigger an assert to invoke a debugger). Combine with the next tip to help localize which part of your CPU app code is triggering a GPU GL error:
Consider adding a option to call glFinish() at lease once every frame after *SwapBuffers(), and a debugging option to call it once per render pass in your frame. With that, you can help localize the cause of a GPU-thrown error to the CPU code which instigated it.

Other random things you can check:

Ensure you’re not calling glTexImage2D (or glTexStorage2D) in your draw loop.
Ensure that you are not creating or destroying ANY GL objects in your draw loop. That means your code as well as anything buried in the Java GL wrapper you’re using. You don’t want any garbage collection of GL objects going on if their deletion triggers deletion on the GL side.
If you’re respecifying (orphaning) buffer objects, make sure that the size you’re using matches the original allocated size. Or just use GL_MAP_INVALIDATE_BUFFER_BIT with glMapBufferRange() and don’t worry about making this mistake. You could also consider supporting PERSISTENT | COHERENT buffer object streaming (see the Buffer Object Streaming) wiki page for details. Which one are you currently using BTW?
Using bindless textures? Make sure that you’re not making more textures resident than will fit comfortably in GPU memory (8GB in this case). It doesn’t sound like you are.
If using bindless buffers, do the same check with resident buffer objects. If both, consider the sum. Also factor in space for all render targets (which is nontrivial in your case, at 16 Mpixels per FBO per render target/attachment).

From your GPU usage description, you should easily fit on an 8GB GPU unless you’re leaking GPU memory creating objects repeatedly in your draw loop. Best guess is you have a GPU buffer object and/or texture memory leak that you don’t realize that you have. Driver debug output and NVX_gpu_memory_info will give you more clues.

jaketehsnake · April 29, 2019, 1:12pm

Thanks a lot. Got much to consider now. I’ll definitely monitor memory consumption on my own machine. It might give me some clues. And I’m sorry I didn’t provide enough information. Just to clarify:

I’ve personally tested this on GTX 960, GTX 1050 and Intel(R) UHD Graphics 630. This with debug output on and there are no warnings/errors. I’ve monitored GPU memory usage with gpu-z and memory usage is consistent on those cards.

Additionally, roughly 20 random gamers have tested it as well, without problems, except for two, coincidently on 1080. One of them have provided me with a log:

java.lang.RuntimeException: GLerr: out of memory
at snake2d.GlHelper.checkErrors(GlHelper.java:74)

|-------------------|
| ERROR LOG |
|-------------------|
java.lang.RuntimeException: GLerr: out of memory
at snake2d.GlHelper.checkErrors(GlHelper.java:74)
at snake2d.TextureHolder.&lt;init&gt;(TextureHolder.java:66)
at resources.sprite.composer.Initer$1.getTexture(Initer.java:69)
at resources.sprite.SPRITES.&lt;init&gt;(SPRITES.java:69)
at resources.RES$Data.&lt;init&gt;(RES.java:34)
at resources.RES$Data.&lt;init&gt;(RES.java:32)
at resources.RES$1.doJob(RES.java:62)
at snake2d.CORE.start(CORE.java:183)
at menu.Menu.start(Menu.java:52)
at main.Syx.main(Syx.java:24)

|-------------------|
| STD OUT |
|-------------------|
Game version : Embryo 1.0

Input Arguments :
Starting the Launcher....

SYSTEM INFO
---Running on a: Windows 10, x86 Platform.
---jre: 1.8.0_201
---Processors avalible: 8
---Reserved memory: 1037Mb
---JRE Input Arguments : -Xms512m, -Xmx1024m,

Firing up an engine, v: 0.9

DISPLAY
---current resolution: 1920x1080x60
---supported resolutions:
-----640x480x50,
-----640x480x59,
-----640x480x60,
-----640x480x75,
-----720x480x50,
-----720x480x59,
-----720x480x60,
-----720x480x75,
-----720x576x50,
-----720x576x59,
-----720x576x60,
-----720x576x75,
-----800x600x25,
-----800x600x29,
-----800x600x30,
-----800x600x50,
-----800x600x59,
-----800x600x60,
-----800x600x75,
-----1176x664x50,
-----1176x664x59,
-----1176x664x60,
-----1024x768x25,
-----1024x768x29,
-----1024x768x30,
-----1024x768x50,
-----1024x768x59,
-----1024x768x60,
-----1024x768x75,
-----1280x720x25,
-----1280x720x29,
-----1280x720x30,
-----1280x720x50,
-----1280x720x59,
-----1280x720x60,
-----1280x720x75,
-----1280x768x25,
-----1280x768x29,
-----1280x768x30,
-----1280x768x50,
-----1280x768x59,
-----1280x768x60,
-----1280x768x75,
-----1152x864x25,
-----1152x864x29,
-----1152x864x30,
-----1152x864x50,
-----1152x864x59,
-----1152x864x60,
-----1152x864x75,
-----1280x800x25,
-----1280x800x29,
-----1280x800x30,
-----1280x800x50,
-----1280x800x59,
-----1280x800x60,
-----1280x800x75,
-----1360x768x25,
-----1360x768x29,
-----1360x768x30,
-----1360x768x50,
-----1360x768x59,
-----1360x768x60,
-----1360x768x75,
-----1366x768x25,
-----1366x768x29,
-----1366x768x30,
-----1366x768x50,
-----1366x768x59,
-----1366x768x60,
-----1366x768x75,
-----1280x960x25,
-----1280x960x29,
-----1280x960x30,
-----1280x960x50,
-----1280x960x59,
-----1280x960x60,
-----1280x960x75,
-----1440x900x25,
-----1440x900x29,
-----1440x900x30,
-----1440x900x50,
-----1440x900x59,
-----1440x900x60,
-----1440x900x75,
-----1280x1024x25,
-----1280x1024x29,
-----1280x1024x30,
-----1280x1024x50,
-----1280x1024x59,
-----1280x1024x60,
-----1280x1024x75,
-----1600x900x25,
-----1600x900x29,
-----1600x900x30,
-----1600x900x50,
-----1600x900x59,
-----1600x900x60,
-----1600x900x75,
-----1600x1024x25,
-----1600x1024x29,
-----1600x1024x30,
-----1600x1024x50,
-----1600x1024x59,
-----1600x1024x60,
-----1600x1024x75,
-----1768x992x25,
-----1768x992x29,
-----1768x992x30,
-----1680x1050x25,
-----1680x1050x29,
-----1680x1050x30,
-----1680x1050x50,
-----1680x1050x59,
-----1680x1050x60,
-----1680x1050x75,
-----1600x1200x60,
-----1920x1080x25,
-----1920x1080x29,
-----1920x1080x30,
-----1920x1080x50,
-----1920x1080x59,
-----1920x1080x60,
-----1920x1080x75,
---blit size: 960 480
---created resolution: 1920x1080, 60Hz, vsync: 1
---LWJGL: 3.1.2 build 29
---GLFW: 3.3.0 Win32 WGL EGL VisualC DLL

OPEN_GL
---FB stencil Bits: 0
---FB depth Bits: 0
---FB Red Bits: 8
---FB Green Bits: 8
---FB Blue Bits: 8
---FB Alpha Bits: 8
---Version: 3.3.0 NVIDIA 419.67
---SL Version: 3.30 NVIDIA via Cg compiler
---glRenderer: NVIDIA Corporation, GeForce GTX 1080 Ti/PCIe/SSE2
---Forward compatible: true

SOUND
---AL version : 1.1 ALSOFT 1.17.2
---AL vendor : OpenAL Community
---AL renderer : OpenAL Soft
---OpenALC10: true
---OpenALC11: true
---ALC_FREQUENCY: 48000Hz
---ALC_REFRESH: 50Hz
---ALC_SYNC: false
---Created Mono Sources : 10
---Created Stereo Sources : 6

[LWJGL] OpenGL debug message
ID: 0x20092
Source: API
Type: PERFORMANCE
Severity: MEDIUM
Message: Program/shader state performance warning: Vertex shader in program 1 is being recompiled based on GL state.
[LWJGL] OpenGL debug message
ID: 0x0
Source: API
Type: ERROR
Severity: HIGH
Message: Unknown internal debug message. The NVIDIA OpenGL driver has encountered
an out of memory error. This application might
behave inconsistently and fail.
(pid=25544 javaw.exe 32bit)
[LWJGL] OpenGL debug message
ID: 0x505
Source: API
Type: ERROR
Severity: HIGH
Message: GL_OUT_OF_MEMORY error generated. Failed to allocate memory for texture.
class snake2d.SoundCore sucessfully destroyed
class snake2d.GraphicContext was sucessfully destroyed
Core was sucessfully disposed

The log tells me that the driver has suffered an out of memory error sometime during execution. ( I don’t poll glGetError() during regular execution. The user has then changed a major state in the game (loaded a certain level, etc.), which in turn releases textures VBO’s and such and creates other ones. During this process there are a lot of glGetErrors() which will crash the game if they return something.

This is where the error is caught, though, like I said, I don’t believe the immediate code surrounding it is the culprit.

		texture = new _TextureDiffuse(diffuse, true);
	
	if (normal != null)
		normalTexture = new _TextureNormal(normal, true);
	else
		normalTexture = null;
	
	pixelWidth = texture.width; 
	pixelHeight = texture.height;
	
	texture.bind();
	if (normalTexture != null)
		normalTexture.bind();
	
	CORE.addTexure(this);
	ColorImp.setSPRITE(x1, y1, w, h);
	
	pixels = VboParticles.getForTexture(pixelWidth, pixelHeight);
	
	FBO = glGenFramebuffers();
	
	glBindFramebuffer(GL_FRAMEBUFFER, FBO);
	glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_RECTANGLE, texture.id, 0);
	
	glDrawBuffers(GL_COLOR_ATTACHMENT0);
	
	if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_FRAMEBUFFER))
		throw new RuntimeException("Could not create fbo");
	
	glBindFramebuffer(GL_FRAMEBUFFER, 0);
    
    GlHelper.checkErrors();

The last line does glGetError and throws an exception. The user has, however, executed this code once before and that didn’t cause an error.

I hope this sheds some more light. I just find it odd since it works faultlessly on other NVIDIA cards.
I will try out the suggestions you’ve given me so far though. Thanks!

Update:

I can confirm that NVX_gpu_memory_info returns the same values during execution on my GeForce GTX 660M, no memory leak there. Evicted memory is however not 0, but does not change.

This is basically what I do each frame:
glBindVertexArray(vertexArrayID);

	glBindBuffer(GL_ARRAY_BUFFER, attributeElementID);

	for (int i = 0; i < NR_OF_ATTRIBUTES; i++) {
		glEnableVertexAttribArray(i);
	}
	
	glBufferSubData(GL_ARRAY_BUFFER, 0, buffer);
	shader.bind();

glDrawElements(GL11.GL_POINTS, (to - from), GL11.GL_UNSIGNED_INT, from * 4);

where ‘buffer’ is the vertex data. I’ve initialized the VAO at init:
glBufferData(GL_ARRAY_BUFFER, buffer, GL_STREAM_DRAW);

Dark_Photon · May 1, 2019, 11:57am

jaketehsnake:

I’ve personally tested this on GTX 960, GTX 1050 and Intel(R) UHD Graphics 630. This with debug output on and there are no warnings/errors.

I’ve monitored GPU memory usage with gpu-z and memory usage is consistent on those cards.

Additionally, roughly 20 random gamers have tested it as well, without problems, except for two, coincidently on 1080. One of them have provided me with a log:
    java.lang.RuntimeException: GLerr: out of memory
    at snake2d.GlHelper.checkErrors(GlHelper.java:74)
...
    java.lang.RuntimeException: GLerr: out of memory
    at snake2d.GlHelper.checkErrors(GlHelper.java:74)
    at snake2d.TextureHolder.&lt;init&gt;(TextureHolder.java:66)
    at resources.sprite.composer.Initer$1.getTexture(Initer.java:69)

From what mentioned (here and elsewhere), it looks like dynamic GL texture creation during level changes is instigating this.

This user with the GL_OUT_OF_MEMORY crash is apparently running on a GTX 1080 Ti, not a GTX 1080.

    [LWJGL] OpenGL debug message
    ID: 0x0
    Source: API
    Type: ERROR
    Severity: HIGH
    Message: Unknown internal debug message. The NVIDIA OpenGL driver has encountered
    an out of memory error. This application might
    behave inconsistently and fail.

I recognize this form. This is where the error was first recognized by the NVidia driver (on the user mode / app side at least) and called your app’s debug message callback to tell you. Generally speaking, for errors recognized up-front by the driver on the app side, you want the stack trace for this callback to see what you were doing then that might have caused it.

Is this user running a 32-bit app, not a 64-bit app? Maybe that’s the kicker. IIRC, there have been users that have seen GL_OUT_OF_MEMORY errors in some cases when running 32-bit apps, as their apps are running out of VM address space (2-3GB), due to the combined usage of the libraries they’re linked to and their usage.

Well this pretty much nails it. The NVidia driver has basically told you that your app trying to allocate storage for a texture is triggering this problem.

If this is being instigated by that user running 32-bit, suggest that they use a 64-bit java.

You might look at reworking your level change logic to stop creating and destroying textures.

Are you testing in a 32-bit Java? If so, when running on an NVidia GPU+driver, look at your virtual memory address space consumption (e.g. in Process Explorer on Windows). Make sure that your VM consumption never gets anywhere near 2-3GB, even after a number of level changes.

jaketehsnake · May 1, 2019, 1:06pm

Thanks again Dark Photon! I didn’t realize there were several versions of 1080.

From what I gather, the user is running windows on 32-bits? (For whatever reason)

---Running on a: Windows 10, x86 Platform.

I’ve also set maximum heap to 1GB. The game itself never uses more that 200MB.
—JRE Input Arguments : -Xms512m, -Xmx1024m

This hopefully means I’d get a java error prior to an opengl driver error if I ran out of address space? I hope so. But there’s also memory allocated outside of the JVM in this opengl wrapper I’m using (buffers), could be that…

I’ve also only monitored this on windows/java 64-bit, but memory consumption shouldn’t be higher on 32-bit, not by a large margin anyway.

Also, texture creation does not return any errors. If we go through the code again, I’ll clarify.

	texture = new _TextureDiffuse(diffuse, true);
	
	if (normal != null)
		normalTexture = new _TextureNormal(normal, true);
	else
		normalTexture = null;
	
	pixelWidth = texture.width; 
	pixelHeight = texture.height;
	
	texture.bind();
	if (normalTexture != null)
		normalTexture.bind();
	
	CORE.addTexure(this);
	ColorImp.setSPRITE(x1, y1, w, h);

The constructor _TextureNormal creates a texture. Just basic texture creation. It does perform a glGetError() after that doesn’t return anything. So, at this ponit everything is fine.

	pixels = VboParticles.getForTexture(pixelWidth, pixelHeight);

this creates a VBO. It doesn’t do anything related to textures. There is no glGetError() after however, so there’s still uncertainty.

	FBO = glGenFramebuffers();
	
	glBindFramebuffer(GL_FRAMEBUFFER, FBO);
	glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_RECTANGLE, texture.id, 0);
	
	glDrawBuffers(GL_COLOR_ATTACHMENT0);
	
	if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_FRAMEBUFFER))
		throw new RuntimeException("Could not create fbo");
	
	glBindFramebuffer(GL_FRAMEBUFFER, 0);
    
    GlHelper.checkErrors();

That leaves only the code above. And it does include textures, right? To clarify, texture.id at this point points to a 4096*4096 32bit texture. It obviously goes through a successful check of the framebuffer, yet triggers an error.

glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_RECTANGLE, texture.id, 0);

This doesn’t allocate anything, right? The texture already lives in GPU memory.

But I’m suspecting that the gpu processes things asynch, meaning that glGetError() without a flush/finish before might be misleading?

I’ll see if I can manage to get the proper stack trace with my wrapper as well as monitor off JVM memory.

Dark_Photon · May 1, 2019, 7:22pm

jaketehsnake:

Also, texture creation does not return any errors.
…
The constructor _TextureNormal creates a texture. Just basic texture creation. It does perform a glGetError() after that doesn’t return anything. So, at this ponit everything is fine.
…
`glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_RECTANGLE, texture.id, 0);`
This doesn’t allocate anything, right? The texture already lives in GPU memory.

No. GL drivers are lazy. When you tell them to create a texture or upload new content to a texture, typically that will be deferred as long as possible. When you need to render with a texture, if it hasn’t been created on the GPU, it is then that it’ll be created and uploaded. If you never render with it (or otherwise require it to be on the GPU), it may never be created and rendered with.

This is one reason why it’s common to pre-render with textures prior to hitting your main game loop. It’s sometimes called “texture warming”.

Yes.

jaketehsnake · May 2, 2019, 6:53am

In practice I am doing that I think. The texture is a sprite atlas, the only texture used throughout the game. Prior to hitting my loop I draw the minimap of the generated level at its bottom right corner. That’s why the texture is also attached to a renderbuffer.

I came across this from the documentation:

Special precautions need to be taken to avoid attaching a texture image to the currently bound framebuffer while the texture object is currently bound and potentially sampled by the current vertex or fragment shader. Doing so could lead to the creation of a "feedback loop" between the writing of pixels by rendering operations and the simultaneous reading of those same pixels when used as texels in the currently bound texture. In this scenario, the framebuffer will be considered framebuffer complete, but the values of fragments rendered while in this state will be undefined. The values of texture samples may be undefined as well.

I am reading and writing to the same pixels. I can live with undefined results, but could this “feedback loop” consume memory somehow?

The docs doesn’t specify GL_TEXTURE_RECTANGLE as a valid texture target either. I’ve seen its valid in an other official documentation however.

But it seems like I’m going to have to buy that card to squash this bug and move on with my life…

Dark_Photon · May 2, 2019, 1:00pm

Well theoretically, undefined includes memory consumption. But practically, I doubt it. See ARB_texture_barrier for more on this.

Related: On mobile, it’s common for changing the contents of a texture outside of the pipeline (e.g. glTex*Image2D()) to trigger texture “ghosting” if there’s a render (or other reference) that will read from the texture already in the pipeline. In this case, the driver “ghosts” the texture (allocates and creates a duplicate texture) to avoid the driver having to stall the pipeline on the subsequent update while references to the previous contents of the texture continue in the pipeline. In this scenario, there are actually multiple physical textures existing on the GPU for one single GL texture handle for a time (knownst to the driver, but unbeknownst to you – unless you look at GPU memory consumption, …or run out of GPU memory in the process). In GL, magic like this is going on under-the-covers all the time, but in Vulkan you have to implement this behavior if you want it.

That said, here you are changing the texture within the pipeline, so I wouldn’t expect the driver to ghost the texture here. (Caveat: I am not/have never been an NVidia graphics driver developer, so I don’t really know what they do.) That said, you are reading from/writing to the same texture, and the driver is likely going to notice this (it tracks render dependencies all the time – it has to!). So in this case, if it were not for the explicit behavior described by ARB_texture_barrier, I would otherwise that the driver might ghost here. However, I seriously doubt this is happening.

You can always remove this code in your app and see if it somehow resolves the problem. It might, but I seriously doubt it.

However, are you sure you can live with undefined results? Undefined includes any/all possible behavior.

Are you sure that the problem 100% correlates with having a GTX1080 Ti? Or is that a statistical fluke? Also, is your system 32-bit, or matches sufficiently to the users having the problem to reproduce this? Unless the cost is really not an issue, I’d hate to see you pony up for this GPU and then not be able to reproduce the problem.

That said, having plenty of experience on NVidia GL drivers and a GTX 1080 Ti specifically, I think you’ll be happy from a dev standpoint. I know I am.

jaketehsnake · May 3, 2019, 5:57am

Ok then. I might not understand the terminology sufficiently, but my entire rendering pipeline basically first writes to textures, then reads from those textures and writes to others until the last one is blit to the screen buffer, so if I’d ever truly ever experience undefined pixels, I’m screwed.

Dark_Photon, thanks for helping out! The bug remains, but I’ve learnt a lot of new stuff thanks to you.

Do I interpret the above as you actually own a GTX 1080 Ti? In that case, is there any way I can persuade you to try out the app? It’s a 70MB download. No installation needed, just an unzip…

jaketehsnake · May 21, 2019, 9:08am

Ok, I’ve manged to interact with the person with the bug and made another build for him riddled with opengl error checks. Initiation seems fine, however, it now crashes at the first swap buffer.

[LWJGL] OpenGL debug message
ID: 0x502
Source: API
Type: ERROR
Severity: HIGH
Message: GL_INVALID_OPERATION error generated. Source and destination dimensions must be identical with the current filtering modes.

The stack trace is broken at this point, but I have a sneaking suspicion that it’s generated when I blit my FBO to the system FB at the end of my rendering loop.

It looks like this:

		glBindFramebuffer(GL_READ_FRAMEBUFFER, fbID);
		glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);
		glBlitFramebuffer(0, 0, width, height, blitX, blitY, blitX + blitW, blitY + blitH, GL_COLOR_BUFFER_BIT, blitFilter);

Where the FBO dimensions are not the same as the blit dimensions and biltfilter is one of GL_LINEAR or GL_NEAREST,

The docs name a number of ways this might generate an openGL error. https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glBlitFramebuffer.xhtml

This FBO contains 3 textures, 1 stencil and 1 depth. It is initialized like this:

	glBindFramebuffer(GL_FRAMEBUFFER, fbID);
	
	iddiffuse = GlHelper.getFBTexture(width, height);
	idNormal = GlHelper.getFBTexture(width, height);
	idLight = GlHelper.getFBTexture(width, height);
	
	glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, iddiffuse, 0);
	glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, idNormal, 0);
	glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, idLight, 0);
	
	glBindRenderbuffer(GL_RENDERBUFFER, stencilID);
	glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH24_STENCIL8 , width, height);
	glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_STENCIL_ATTACHMENT, GL_RENDERBUFFER, stencilID);
	
	glDrawBuffers(diffuseNormalBuffer);
	
	if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_FRAMEBUFFER))
		throw new RuntimeException("Could not create fbo");
	
	glBindFramebuffer(GL_FRAMEBUFFER, 0);
	
	glActiveTexture(GL_TEXTURE2);
	glBindTexture(GL_TEXTURE_2D, iddiffuse);
	
	glActiveTexture(GL_TEXTURE3);
	glBindTexture(GL_TEXTURE_2D, idNormal);
	
        GlHelper.checkErrors();

And the textures are generated thus:

	int id = glGenTextures();
	glBindTexture(GL_TEXTURE_2D, id);
	glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
	glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
	glTexParameteri(GL_TEXTURE_2D, GL12.GL_TEXTURE_BASE_LEVEL, 0);
	glTexParameteri(GL_TEXTURE_2D, GL12.GL_TEXTURE_MAX_LEVEL, 0);
	glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL12.GL_BGRA, GL_UNSIGNED_BYTE, (java.nio.ByteBuffer) null);
	return id;

Apparently it works if blitfilter is GL_LINEAR, although not in full screen for some reason (gets the same error message). Could it be some format mismatch (float/byte) Could the three textures mess things up? Could this be what led up to the out of memory bug if the error had gone unchecked?

Update:
From the docs I gather that:

GL_INVALID_OPERATION is generated if *filter* is GL_LINEAR and the read buffer contains integer data.

Is the most likely culprit, since I think my read buffer’s texture’s type are indeed specified as integers.

glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL12.GL_BGRA, GL_UNSIGNED_BYTE, (java.nio.ByteBuffer) null);

I’ve changed GL_UNSIGNED_BYTE to GL_FLOAT and there is not difference on my machine. (What does it even do. Does it affect shaders?)

Do you think this could be it?

Dark_Photon · May 21, 2019, 12:41pm

jaketehsnake:

    Message: GL_INVALID_OPERATION error generated. Source and destination dimensions must be identical with the current filtering modes.
…I have a sneaking suspicion that it’s generated when I blit my FBO to the system FB at the end of my rendering loop.
glBindFramebuffer(GL_READ_FRAMEBUFFER, fbID);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);
glBlitFramebuffer(0, 0, width, height, blitX, blitY, blitX + blitW, blitY + blitH, GL_COLOR_BUFFER_BIT, blitFilter);
Where the FBO dimensions are not the same as the blit dimensions and biltfilter is one of GL_LINEAR or GL_NEAREST, …

Apparently it works if blitfilter is GL_LINEAR, although not in full screen for some reason (gets the same error message). Could it be some format mismatch (float/byte) Could the three textures mess things up? Could this be what led up to the out of memory bug if the error had gone unchecked?

Ok. So you suspect the blit from your FBO to the window is causing this, and it breaks in fullscreen sometimes.

It’s puzzling to explain if we take the driver debug message literally. However, it could be by filtering mode, it’s not just referring to the GL_NEAREST/GL_LIINEAR param, but instead any conversion from source to dest.

One case where glBlitFramebuffer() requires the dimensions to match is if the MSAA sample counts are different between the render targets. The jist is: One glBlitframebuffer() can either 1) resize the dimensions, or 2) downsample (or upsample) the image to a different number of MSAA samples – not both simultaneously. To do both, you need 2 blits.

If we assume in your case that the GL error code is right but perhaps the warning might not be the right one, then might be explainable by your user having a graphics driver override setting set which forces MSAA on app fullscreen windows (or at least app windows). I’ve hit this myself before. You’ve already said that your source and target dimensions are different. Therefore you’re expecting the blit to do a resize. This would preclude doing a blit between two render targets where the source and dest MSAA sample counts differ, resulting in a GL_INVALID_OPERATION if you tried to do this.

If this theory is correct, one thing you could do to try and avoid this issue is do do your own offscreen resize (and downsample if necessary) blit from your render FBO to a temp FBO. This to get an image that is exactly the same size as the window dimensions you will be blitting to. Then you can let the blit to the window perform any number of samples conversion, if needed.

Another thing you could try just short of that is to change your FBO-to-window blit so that it just “never” resized the image. That is, even if the window was a different dimensions, it would always use the source dimension. See if this gets rid of the error. If so, perhaps it’s worth jumping through the hoop above to support this fully (i.e. supporting the case where the user has forced an odd pixel format for their window through the graphics driver).

Related: Just FYI, at least with NVidia GL drivers, the user can force both MSAA on and a custom number of samples per pixel for the system framebuffer that is created to render to the application window via the driver control panel (NVidia Settings). This can foul up any call to glBlitFramebuffer to the window where the developer is expecting the blit to do a resize (and/or a downsample). Even if the dimensions match, you can end up with weird situations like blitting from your offscreen FBO with 4x MSAA to a system/window framebuffer which has 8x MSAA, resulting in the blit call failing. You just have to know that that’s a possibility and write your code appropriately, adding code to support that (which isn’t hard), or just checking for that up-front by querying the GL_SAMPLE_BUFFERS and GL_SAMPLES state of the system framebuffer and failing your app on startup. Obviously, the former is more user friendly – particularly for a game played by users that may have no idea how to modify their graphics driver settings.

I’ll follow up with more info here shortly…

Dark_Photon · May 21, 2019, 12:57pm

Here are a few possibly-related threads to check out:

jaketehsnake · May 21, 2019, 2:35pm

Fantastic! I can now reproduce the bug by setting forced antialiasing options in the nvidia control-panel. This was not the original problem, but I suppose it’s possible that it causes a OUT_OF_MEMORY eventually somehow… ?

This multi sampling business is new information to me, but I think I understand the grasp of it. That a pixel doesn’t have to be one set of RGB values, but can hold several for better/faster sampling at the cost of memory usage.

So what’s the proper way to fix this? From reading dark_photon’s reply and the suggested threads, there doesn’t really seem to be a consensus or even a 100% working solution.
There are a few I can think of

A middle-man FBO, just as dark_photon suggested, with the same dimensions as the window FB (is there a proper name for the window FB?). This with the same format as the original FBO. Blit to this first, then to the window FB. It seems wasteful though. It’s 1920x1080x4 bytes of data that needs additional processing each render pass. It might be silly thinking though.
Upon creation, force the windows’s FB to be the same sampling size as my FBO. But I don’t think this works, as the NVIDIA settings probably overrides everything I try to do here.
Create my FBO to have the same format as the window FB. Seems difficult to get it just right + will load my rendering pipeline with x4/x8 operations per pixel that I don’t need.

arekkusu · May 21, 2019, 3:46pm

See also EXT_framebuffer_multisample_blit_scaled from 2010.

Dark_Photon · May 21, 2019, 8:33pm

Great!

Seems unlikely. I doubt that a blit failure is going to allocate a bunch of GPU or GL driver memory.

The OpenGL spec calls this the “default framebuffer”. This is the “window-system provided framebuffer”. Typically it’s used to render to a window, but could instead be used to render to a pbuffer. I tend to call this framebuffer the “system framebuffer”.

Minor correction: You want this FBO to be single-sample, not the same format as the window FB. That way the blit from the FBO can upsample from 1x (single-sample) to whatever wild AA the window might have been forced to, without generating an error in the blit.

This may not even be possible. There’s no guarantee that the pixel format used/forced by the window is even available to an FBO AFAIK.

Dark_Photon · May 21, 2019, 9:16pm

Thanks for the tip! Didn’t know about that one. Could be a good solution on supporting drivers:

EXT_framebuffer_multisample_blit_scaled driver reports (gpuinfo.org)

jaketehsnake · May 23, 2019, 7:13am

I can confirm that using a second FBO with the same dimensions as the default FB and a blit fixed the original bug. It’s unclear how/if this caused the OUT_OF_MEMORY. Could be the client was actually low on memory when trying it out the first time, could be change in drivers, or could be some of the minor refactoring I did in the current version, but it will remain a mystery to me. Thanks for all your help!

Dark_Photon · May 23, 2019, 11:46am

Glad to hear that you solved it!

Also, thanks for following-up with the resolution. This thread will probably help other folks out in the future. (I personally search these forums all the time – so many useful tips in here!)

jaketehsnake · August 2, 2019, 7:34am

I can now admit that I was completely wrong about everything…

It was related to 32-bit JVM’s, just as @Dark_Photon suggested. And it’s pretty weird.

When shipping the app I set the max heap for the JVM to 1024MB. Plenty of space, I thought. And I was right. All you need is actually 200MB heap.

What’s strange is that the higher I set maximum heap, the higher the chance of an out of memory error. This only happens on 32bit java.

Now, from what I gather, it’s not the JVM heap running out of space. It’s the “direct”/“native”/“off heap” memory that the JVM also allocates. This is the memory used by the JVM to communicate with the GPU. From the docs, it seems that this memory is the same size as the maximum JVM memory.

So by setting a larger heap, I’m setting a larger “off heap” buffer.

This, should be a good thing, I suppose. But somehow it produces an OUT_OF_MEMORY when allocating on the “off heap”.

Why? Well, I hope someone will tell me this… @Dark_Photon ?
My theory is that it has to do with memory boundaries. Perhaps the “off heap” memory is in a peculiar position, e.g. starts too far off in the address space. Perhaps it’s paging, I’m not smart enough for this… Somehow it can’t allocate a continuous 64MB.

Update:
I’ve got answers from the folks over at LWJGL. You can check it out here:

I’m sorry, but this turned out to have nothing to do with openGL, but with Java mechanics. However might helps someone out there.

Dark_Photon · August 3, 2019, 12:56am

Given the symptoms you’ve described thus far, it sounds like you’re running out of user virtual memory address space here, not physical CPU RAM.

Depending on the OS and OS config for those 32-bit failure cases, there should be a max of somewhere between 2GB and 4GB of virtual memory address space available within user applications (probably 2 or 3GB). A few web searches suggests that the JVM carves the entire heap out of contiguous VM address space. This leaves that much less VM address space available for other purposes such as backing CPU physical memory page allocations and memory mapping.

Maximum Java heap size of a 32-bit JVM on a 64-bit OS

Ok, good. Glad you got it solved.