NVIDIA releases OpenGL 4.2 drivers

It is legal. The “binding” part isn’t covered in the ARB_shader_image_load_store extension, because the binding feature for samplers and images is in ARB_shading_language_420pack.

I’ll be investigating further tomorrow, but it appears that the problem involves a combination of qualifiers. A test shader I wrote based on your example above, fails with the declaration as above, but compiles successfully without the “writeonly” qualifier.

Could you check that you really see trilinear or anisotropic filtering on your tests?

I’ve visually verified that mipmaping and anisotropic filtering works fine with glTexStorage2D. But there is a difference between your and my code. I load only the base level and generate all the others using glGenerateMipmap. I do not use sampler objects.

Yes. I can confirm that glGenerateMipmap works!

Further this allows for the following workaround to be able to load custom mipmap levels:


glTexStorage2D(object_target(),
               init_mip_levels,
               util::gl_internal_format(in_desc._format),
               in_desc._size.x, in_desc._size.y);
// no error reported
for (unsigned i = 0; i < init_mip_levels; ++i) {
    math::vec2ui lev_size      = util::mip_level_dimensions(in_desc._size, i);
    const void*  init_lev_data = in_initial_mip_level_data[i];
    glTexSubImage2D(object_target(),
                    i,
                    0, 0,
                    lev_size.x, lev_size.y,
                    gl_base_format,
                    gl_base_type,
                    init_lev_data);
    // WORKAROUND /////////////////////////////////////
    if (i == 0) glGenerateMipmap(object_target());
}
// still no errors reported

Ugly yes, but a workaround to play with immutable textures until we get a bug fix. Which i am sorry to say lately takes very long, but lets hope for the final OpenGL 4.2 driver!

Edit: Until we require mipmapped integer textures… Then this workaround can not work, as glGenerateMipmap does not work on integer textures. So back to the old way to init our textures for now :p.

@nvidia: which release is targeted for the OpenGL 4.2 final implementation? r285?

-chris

I can also report that usage of glTexStorage2D has very negative impact on performance.

It looks like the glTexStorage2D does not allocate the mipmap levels at all. Later when calling glGenerateMipmap the driver realizes that there are no mipmap levels and allocates them by doing the same patch work as the older drivers. Essentially loading the texture back to memory, reallocating it with mipmaps and loading it back again. This round trip takes about 40ms for some large textures.

@nvidia: could you confirm this bug?

@mfort It’s maybe really early to expect performance.

@Chris Lux: I’ve dug into our GLSL compiler and root caused your issue related to the “binding” layout qualifier and have a fix undergoing testing. As I mentioned above, you may be able to get further with current drivers if you omit the “writeonly” qualifier.

I expect that one of my colleagues will be looking at the TexStorage* issue described here soon.

great to hear…

I expect that one of my colleagues will be looking at the TexStorage* issues described here soon.

fixed that ;). there are two issues that need to be addressed, the mipmap access problem and the performance problem. the texture storage should immediately be allocated at the glTexStorageXD call to be an improvement over the old way to allocate texture.

will there be an updated 4.2 dev driver or will we have to wait for the public release?

We’ll fix these issues asap. There will be a new developer driver released as soon as these issues are fixed.

Updated OpenGL 4.2 drivers can be found at the usual location:
http://developer.nvidia.com/opengl-driver

The new 280.36 driver addresses at least the following issues reported in this thread:

  1. glTexStorage has been fixed to correctly created all mipmap levels.
  2. Storage is allocated upfront when glTexStorage is called to address a performance issue that Chris reported.
  3. The problem with mixing the binding layout qualifier and writeonly with images has been fixed so the shader text “layout(rgba16ui, binding = 0) writeonly uniform uimage2D _stuff;” now compiler correctly.

Thanks for the update Piers!

In my tests the bugs to no occur anymore ;). Great work with the quick fixes…

Thanks Piers. Driver 280.36 is improvement.
But there is still some weird performance issue. When I use the glTexStorage2D with no mipmaps then I get decent performance improvement over R275. Mostly because the first glTexSubImage after creating the texture does not suffer any slowdown (more here: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Main=58030&Number=301023).

But when I enable the mipmaps then creating textures with glTexStorage2D is about 10 times slower then calling glTexImage2D(…, NULL) for each level. The slowdown in previous drivers was caused by glGenerateMipmap. In this drivers any simple call to OGL stalls a bit. (5-7ms)

What do you mean by “enable the mipmaps”? The way glTexStorage2D should be used is that it is pretty much the first thing you call on a texture object.

I “enabled” it my app. In the end it means I requested to allocate certain number of mipmap levels in OGL and later using mipmap texture sampling. So nothing special actually.

I still have issues with the binding layout specifier. For example a simple shader:


#version 420 core

in per_vertex {
    vec3 tex_coord;
} v_in;

// attribute layout definitions ///////////////////////////////////////////////////////////////////
layout(location = 0, index = 0) out vec4 out_color;

layout(binding = 0) uniform sampler2D tex_image;
layout(binding = 1) uniform sampler3D tex_volume;

main()
{
    //vec4 c = texture(tex_image, v_in.tex_coord.xy);
    vec4 c = texture(tex_volume, v_in.tex_coord);

    out_color = c;
}

On the client side i just bind the textures with according sampler objects to the unit 0 and 1 without setting the according uniform values to 0 an 1. When using the 2D sampler in the shader i get the results, but when using the 3D texture and thus eliminating the 2D sampler in the link process i get no results.

BTW, you should know that if a fragment shader has only one user-defined output variable, it will automatically be bound to location=0, index=0. So there’s no need to explicitly state it.

I know that, but i like it explicit.

@mfort
> But when I enable the mipmaps then creating textures with glTexStorage2D is about 10 times slower then calling glTexImage2D(…, NULL) for each level.

With the latest 280.36 driver the glTexStorage2D will allocate the hardware surfaces upfront instead of deferring it to the glTexSubImage2D calls. The older style glTexImage2D(,NULL) calls won’t create the hardware surface.

Overall, calling glTexStorage2D and then glTexSubImage2D for each level should be about the same performance as glTexImage2D(,NULL) then glTexSubImage2D for each level. And they should both be faster than using glGenerateMipmaps or glTexParameter(GL_GENERATE_MIPMAP, TRUE). The advantage of glTexStorage2D is that the surface allocation happens immediatly and the texture is immutable (except for image data).

If this is not the performance what you’re seeing in your app, could you paste some code/pseudo-code to explain where performance is falling short. Thanks.

@Chris Lux
I’ll investigate this bug and respond soon.

Thanks for your efforts, fortunately it was a problem on my end. Everything now seems to work as expected…

-chris

I am also seeing some strange performance hits when using TexStorage. In the following log example i am generating two volume textures and one 1D texture:

  • using TexStorage3D to allocate the texture image including mipmaps
  • then i upload the first level using TexSubImage3D
  • after that i use GenerateMipmap to generate the missing mipmaps

As you can see the first volume is completed very fast. Then the second takes a huge amount of time for TexStorage3D. What is most interesting is that TexStorage+TexSubImage on the 1D texture after the two volume textures takes also extremely long…

This was tested on Windows 7 x64, GeForce GTX 480 1.5GiB, 280.36. The times were taken using QueryPerformanceCounter.


volume_data::volume_data(): loading raw volume...
>     allocating texture storage (dimensions: (501  401  576), format: R_8, mip-level: 10, size : 110.358MiB)...
>     allocating texture storage done. (elapsed time: 0.000s)
>     uploading source mip-level 0...
>     uploading source mip-level 0 done. (elapsed time: 0.059s)
>     generating mip-levels 1 - 10 ...
>     generating mip-levels 1 - 10 done. (elapsed time: 0.000s)
volume_data::volume_data(): loading raw volume done.

volume_data::volume_data(): generating pre-multiplied volume...
>     generating color and alpha lookup tables...
>     generating color and alpha lookup tables done. (elapsed time: 0.000s)
>     starting pre-multiplication...
>     pre-multiplication done. (elapsed time: 0.430s)
>     allocating texture storage (dimensions: (501  401  576), format: RGBA_8, mip-level: 10, size : 441.433MiB)...
>     allocating texture storage done. (elapsed time: 7.278s)
>     uploading source mip-level 0 ...
>     uploading source mip-level 0 done. (elapsed time: 3.884s)
>     generating mip-levels 1 - 10 ...
>     generating mip-levels 1 - 10 done. (elapsed time: 0.002s)
volume_data::volume_data(): generating pre-multiplied volume done.

volume_data::volume_data(): generating color map...
>     generating color map texture data...
>     generating color map texture data done. (elapsed time: 0.001s)
>     allocating texture storage and uploading texture data (dimensions: 256, format: RGBA_8, mip-level: 1, size : 1.000KiB)...
>     allocating texture storage and uploading texture data done. (elapsed time: 10.247s)