NVIDIA releases OpenGL 4.2 drivers

@Piers - I’ve also done some investigation about glTexStorage. The situation is more complicated then I thought. I was able to reproduce this performance problem in my test app. The key thing that hurts the performance is binding the texture object between glTexStorage and glTexSubImage.

This works fast:


glTexStorage2D(GL_TEXTURE_2D, numLevels, GL_RGBA8, textureWidth, textureHeight);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, texturePBO);
glTexSubImage2D(target, 0, 0, 0, textureWidth, textureHeight,GL_BGRA_EXT, GL_UNSIGNED_BYTE, NULL);

This is slow:


glTexStorage2D(GL_TEXTURE_2D, numLevels, GL_RGBA8, textureWidth, textureHeight);

glBindTexture(target, textureObj); // This makes it slow

glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, texturePBO);
glTexSubImage2D(target, 0, 0, 0, textureWidth, textureHeight,GL_BGRA_EXT, GL_UNSIGNED_BYTE, NULL);

I’d like to note that there is no such slowdown when creating the texture by calling glTexImage2D(…, NULL) for each mipmap level.

I believe there are some sanity checks inside glBindTexture but due to some bug the driver does something that should not be done.

BTW. I measure the time from creating the texture up to the drawing a test triangle. This way I am sure that all the asynchronous processing is really done. When I do not rebind the texture then it takes 0.5ms. When I rebind it then it takes 7.5ms. My test hw is GTX260 @ WinXP x64. I get similar results on other computers with different NV cards. Driver 280.36.

@Chris
Any chance I could get a more detailed repro for your case?

@mfort
I think I can easily recreate your case. I’ll investigate soon.

@mfort
I was able to reproduce this issue and I have now fixed the bug. I’ll try to get a new driver released soon. Hopefully this same fix will help Chris too.

Hello Piers,
i sent you a PM a few days ago with a repro binary and some source code. Did you get to take a look?

when place memoryBarrier() in shader code, the driver reports that "error C7531: global function memoryBarrier requires “#extension GL_EXT_shader_image_load_store : enable” before use.while i sure specify the version number like this at the very beginning:

#version 420 compatibility

is it a bug or do i miss something? thanks in advance.
i use the 32bit version of this driver (280.36).

That’s not the first time I’ve seen NVIDIA’s drivers mistake core functionality for something in an extension. I had to use #extension GL_EXT_gpu_shader4 : enable just to get gl_PrimitiveID to work in a geometry shader.

I have posted updated OpenGL 4.2 developer preview drivers to the usual location: http://developer.nvidia.com/opengl-driver.

The new version is 280.47 for Windows and 280.10.01.04 for Linux.

Amongst other things, this new driver addresses the following:

  1. glTexStorage performance issue should be fixed.
  2. Atomic counter performance has been substantially improved.
  3. Issue with gl_PerVertex interface block redeclaration in vertex shader has been fixed.
  4. Fixed issue with atomic counters and glBufferData(,NULL), where the buffer object wasn’t created properly.

Enjoy!

@robotech_er
Thanks for the bug report. This problem with memoryBarrier() is an oversight. It will be fixed in a future driver.

@Alfonse Reinheart
Hmm, gl_PrimitiveID has apparently been broken since GLSL 150. Will be fixed in a future driver.

@Chris Lux
Sorry I haven’t got back to you on this yet. I had some trouble downloading from dropbox at work. If you get a chance, please see if 280.47 fixes this issue. In the meantime I’ll try to get your package when I’m outside the work firewall.

@Alfonse Reinheart & @Piers Daniell, thanks.

another weird thing, use imageLoad() do vtf like thing, at first everything works right:

#version 420 compatibility

layout(r32ui) readonly uniform uimage2D tex_height;

void main()
{
ivec2 itex_coord = ivec2(gl_Vertex.xz);
float height = float(imageLoad(tex_height, itex_coord).x);
vec4 newVertexPos = gl_Vertex * vec4( 100.0, 1.0, 100.0, 1.0) + vec4( 0.0, height, 0.0, 0.0);
gl_Position = gl_ModelViewProjectionMatrix * newVertexPos;
}

then , need another imageLoad operation after the original one:

#version 420 compatibility

layout(r32ui) readonly uniform uimage2D tex_height;
layout(r32ui) readonly uniform uimage2D tex_offset; //new added image

void main()
{
ivec2 itex_coord = ivec2(gl_Vertex.xz);
float height = float(imageLoad(tex_height, , itex_coord).x);

uint	val	 	= imageLoad(tex_offset, itex_coord).x; 		//the second image access, if comment out, everything works right

vec4 newVertexPos	= gl_Vertex * vec4( 100.0, 1.0, 100.0, 1.0) + vec4( 0.0, height, 0.0, 0.0);	
gl_Position		= gl_ModelViewProjectionMatrix * newVertexPos;

}

and now everything messes up, it seems like the return value of the first imageLoad() is kind of undefined value, but obviously the two imageLoad() operations are totally unrelative. why? thanks.
i have updated the driver to the latest 280.47.

@Piers Daniell

I found the problem with my example. The main problem with the timing is, that per default the nvidia driver enables threading optimizations. This defers the execution of some calls. In my example the GenerateMipmap call takes very long for the volume textures and it seems it is deferred until the next TexStorage call. After disabling the threading optimizations GenerateMipmap shows the expected longer execution times. I think i can work around this by building the mipmaps myself in a real world application (this beeing only a test).

Then again THANKS Piers for the great efforts bringing us beta drivers after fixing issues. I think this is something that nvidia should keep up, always having some OpenGL developer drivers available after some serious fixes or additions of new features.

Edit: Any news on the planned availability of GL_ARB_cl_event and cl_khr_gl_event extensions?

Thanks
-chris

The first beta from the new r285 family of driver has just been posted to nvidia.com as version 285.27. Note that this specific driver does not contain the gl_PerVertex fix, atomic counter performance or glTexStorage performance fix that is in 280.47. So if you need these fixes, please stay with 280.47. The next r285 driver will contain all these fixes. Otherwise the new 285.27 driver has all the OpenGL 4.2 goodness that 280.36 has.

i am sure that it is not the driver problem. the same code works alright under another app.

Sincere apologize for the waste of time…

Hi,

Just want to know whether Nvidia GeForce GT 555M support’s the new opengl 4.2 drivers ?

cheers,
Mahesh Kondraju

Why don’t you try and find out by yourself? :slight_smile:

GT 555M is based either on GF108 or on GF106 (only Lenovo Y570p/Y560p uses GF108, but it’s the slowest model), so it supports GL4.2 completely. The only problem that might arise is the existence of only desktop drivers for GL4.2. Try to download tweaked drivers from laptopvideo2go, or wait until the regular once become published.

Hi,

Setting of binding point for UBO through GLSL doesnt work for me. Is it my problem or driver bug? I’m using the core profile and “classic” shader initialization. And my OS is Vista x64

Could you post a minimal shader and GL call sequence that shows the problem.

Ok. Here is my pseudo code for GL 4.1

There are 2 UBOs, which are declared in shader


  layout(std140,row_major) uniform Common {
    float fTime;
    mat4  mProjection;
  };
  layout(std140,row_major) uniform Transformations {
    mat4 mWorldView[1024];
  };

After linking of shaders i manually specify binding points for my UBOs in GL code


glUniformBlockBinding(m_hGLShaderProgram,glGetUniformBlockIndex(m_hGLShaderProgram,"Common"),0);
glUniformBlockBinding(m_hGLShaderProgram,glGetUniformBlockIndex(m_hGLShaderProgram,"Transformations"),1);

But in GL 4.2 there is ability to specify binding points directly in shader. So my shader for GL 4.2 looks like


  layout(std140,row_major,binding=0) uniform Common {
    float fTime;
    mat4  mProjection;
  };
  layout(std140,row_major,binding=1) uniform Transformations {
    mat4 mWorldView[1024];
  };

And for me binding through shader doesnt work, i still must specify binding points through GL code

Excuse me, but what about my problem?! :slight_smile:

Ok, i know, my English sucks, i will explain the problem using another words:

I have an OpenGL context with OpenGL 4.2 core profile, i don’t use GL_ARB_separate_shader_objects and i can’t set
the uniform block binding index through GLSL using the binding layout qualifier

Version 4.20 binding doesnt work for me
http://www.opengl.org/wiki/Uniform_Buffer_Object

I am working under Vista x64 and i have 460GTX

Strange, that’s something that works for me I believe… you brought me a doubt now. Erm.