NVIDIA releases OpenGL 4.4 beta drivers

Are you confusing “uniform block” with “struct”?

Opaque types cannot be used in uniform blocks, but I see no such prohibition on structs.

[QUOTE=Piers Daniell;1253892]Yes, the GLSL spec allows this. I confirmed with the spec editor. This sentence on page 27 of the GLSL 440 spec:
They [opaque values] can only be declared as function parameters or uniform-qualified variables.

doesn’t just mean basic-type variables, it means any variable, including structs.[/QUOTE]
very cool! thanks again for the clarification.

[QUOTE=GClements;1253905]Are you confusing “uniform block” with “struct”?

Opaque types cannot be used in uniform blocks, but I see no such prohibition on structs.[/QUOTE]
nope, i meant structs that are used as uniforms. i use(d) them to pack samplers together and to instantiate multiple such structs with samplers inside.

i just installed the 326.58 and still get the error:
error C7554: OpenGL requires sampler variables to be explicitly declared as uniform

the case where i get the error is like this:


struct A {
    sampler2D a_sampler;
};

struct B {
    A a;
    sampler2D b_sampler;
    float something;
};

uniform B b;

it now complains for the a_sampler declaration.

We have published an updated version of the Windows OpenGL 4.4 beta drivers to version 326.77. Linux updates will follow shortly. You can download them from the usual place:

This update fixes the following bugs reported in this forum thread:
Comment #13: main() terminated prematurely
Comment #22: More problems with samplers in structs

Also fixed:
Unable to allocate a DEPTH_COMPONENT16 sparse texture
Rendering corruption with sparse depth textures
Fix system instability when using multiple sparse textures

[QUOTE=oscarbg;1253896]Thanks for this fast bug fixing!
Now some questions hope you can answer some of them:
I found references about NVX_shader_thread_group and NVX_shader_thread_shuffle and on twitter Pat Brown said that he thought were already implemented on driver.
Are specs coming soon jointly for also new WGL_NV_delay_before_swap extension?

Release notes of new 325 drivers mention TXAA support on OpenGL… I obtained TXAA SDK (v2.1) contacting Lottes… question is if TXAA on OpenGL is already present for Linux 325 driver or Windows driver only?.. if Linux support is present I would like to test but my TXAA SDK libs aren’t compiled for Linux…
Also we get now GL_NVX_nvenc_interop extension reported but current NVENC 2.0 SDK June 2013 release on web doesn’t mention support yet for NVENC encoding from OGL buffers/texes (only CUDA pointers and DX support presently)… any clue?

Latest Intel drivers seems to ship with new exts:
GL_INTEL_fragment_shader_span_sharing
Lottes commented “GL_INTEL_fragment_shader_span_sharing might be sharing across the threads in a 2x2 pixel quad?”
Eric Penner’s “Shader Amortization using Pixel Quad Message Passing" (Gpu Pro 2) talks about some usage cases
In case Fermi/Kepler HW supports that is NV interested on implementing/exposing this functionality…
GL_INTEL_compute_shader_lane_shift
That also seems like new warp SHUFL intrustion in PTX parlance in Kepler might NV also expose that?

thanks…
[/QUOTE]

Sorry for the delay in replying. The WGL_NV_delay_before_swap extension spec can be found here:
http://www.opengl.org/registry/specs/NV/wgl_delay_before_swap.txt

The NVX_shader_thread_group and NVX_shader_thread_shuffle extensions aren’t quite ready, but we hope to release specs for these soon.

I don’t know the status of any of the other specs you mention. I’ll need to do a little research.

I have posted a new Windows OpenGL 4.4 beta driver 326.84 to the usual place:

This fixes an issue with CUDA/OpenCL and an issue with using lots of sparse textures.

BTW, what happened to the 325.05.04 Linux beta posted here. Was that revoked?

(I have it downloaded, but I see 325.05.03 is the latest listed there now.)

I’ve fixed the webpage back to pointing to 325.05.04. Sorry about that. I hope to get a new revision posted next week.

The OpenGL 4.4 beta drivers for both Windows and Linux have been updated. The new Windows version is 326.98 and the new Linux version is 325.05.13. The major fix in these new drivers is a regression in the functionality of atomics. New drivers can be found in the usual place:

There is a known issue with layered rendering to sparse textures, notably cube maps, that we’re currently investigating. A fix for this should be available next week.

Trying ARB_bindless texture on linux I got the error below on some programs (only a different define but no change on layout or binding). Linux driver 32.05.13.

glProgramUniformHandleui64vARB => GL_INVALID_OPERATION. Element is invalid

More verbose error message will be welcome :slight_smile: Specs only deals with invalid layout (or binding).

Here the full shader. This one is fine. If I replace the initial “#define PS_ATST 1” by “#define PS_ATST 6”. I got the above error!


#version 330 core
#extension GL_ARB_shading_language_420pack: require
#extension GL_ARB_separate_shader_objects: require
#extension GL_ARB_shader_image_load_store: require
#extension GL_ARB_bindless_texture: require
#define ENABLE_BINDLESS_TEX
#define FRAGMENT_SHADER 1
#define ps_main main
#define PS_FST 0
#define PS_WMS 0
#define PS_WMT 0
#define PS_FMT 0
#define PS_AEM 0
#define PS_TFX 0
#define PS_TCC 1
#define PS_ATST 1
#define PS_FOG 1
#define PS_CLR1 0
#define PS_FBA 0
#define PS_AOUT 0
#define PS_LTF 0
#define PS_COLCLIP 0
#define PS_DATE 0
#define PS_SPRITEHACK 0
#define PS_TCOFFSETHACK 0
#define PS_POINT_SAMPLER 0
#define PS_IIP 1
//#version 420 // Keep it for text editor detection

// note lerp => mix

#define FMT_32 0
#define FMT_24 1
#define FMT_16 2
#define FMT_PAL 4 /* flag bit */

// Not sure we have same issue on opengl. Doesn't work anyway on ATI card
// And I say this as an ATI user.
#define ATI_SUCKS 0

#ifndef VS_BPPZ
#define VS_BPPZ 0
#define VS_TME 1
#define VS_FST 1
#define VS_LOGZ 0
#endif

#ifndef PS_FST
#define PS_FST 0
#define PS_WMS 0
#define PS_WMT 0
#define PS_FMT FMT_32
#define PS_AEM 0
#define PS_TFX 0
#define PS_TCC 1
#define PS_ATST 1
#define PS_FOG 0
#define PS_CLR1 0
#define PS_FBA 0
#define PS_AOUT 0
#define PS_LTF 1
#define PS_COLCLIP 0
#define PS_DATE 0
#define PS_SPRITEHACK 0
#define PS_POINT_SAMPLER 0
#define PS_TCOFFSETHACK 0
#define PS_IIP 1
#endif

struct vertex
{
    vec4 t;
    vec4 c;
	vec4 fc;
};

#ifdef FRAGMENT_SHADER

#if !GL_ES && __VERSION__ > 140

in SHADER
{
    vec4 t;
    vec4 c;
    flat vec4 fc;
} PSin;

#define PSin_t (PSin.t)
#define PSin_c (PSin.c)
#define PSin_fc (PSin.fc)

#else

#ifdef DISABLE_SSO
in vec4 SHADERt;
in vec4 SHADERc;
flat in vec4 SHADERfc;
#else
layout(location = 0) in vec4 SHADERt;
layout(location = 1) in vec4 SHADERc;
flat layout(location = 2) in vec4 SHADERfc;
#endif
#define PSin_t SHADERt
#define PSin_c SHADERc
#define PSin_fc SHADERfc

#endif

// Same buffer but 2 colors for dual source blending
#if GL_ES
layout(location = 0) out vec4 SV_Target0;
#else
layout(location = 0, index = 0) out vec4 SV_Target0;
layout(location = 0, index = 1) out vec4 SV_Target1;
#endif

#ifdef ENABLE_BINDLESS_TEX
layout(bindless_sampler, binding = 0) uniform sampler2D TextureSampler;
layout(bindless_sampler, binding = 1) uniform sampler2D PaletteSampler;
#else
#ifdef DISABLE_GL42
uniform sampler2D TextureSampler;
uniform sampler2D PaletteSampler;
#else
layout(binding = 0) uniform sampler2D TextureSampler;
layout(binding = 1) uniform sampler2D PaletteSampler;
#endif
#endif

#ifndef DISABLE_GL42_image
#if PS_DATE > 0
// FIXME how to declare memory access
layout(r32i, binding = 2) coherent uniform iimage2D img_prim_min;
#endif
#else
// use basic stencil
#endif

#ifndef DISABLE_GL42_image
#if PS_DATE > 0
// origin_upper_left
layout(pixel_center_integer) in vec4 gl_FragCoord;
//in int gl_PrimitiveID;
#endif
#endif

#ifdef DISABLE_GL42
layout(std140) uniform cb21
#else
layout(std140, binding = 21) uniform cb21
#endif
{
    vec3 FogColor;
    float AREF;
    vec4 HalfTexel;
    vec4 WH;
    vec4 MinMax;
    vec2 MinF;
    vec2 TA;
    uvec4 MskFix;
    vec4 TC_OffsetHack;
};

vec4 sample_c(vec2 uv)
{
    // FIXME: check the issue on openGL
	if (ATI_SUCKS == 1 && PS_POINT_SAMPLER == 1)
	{
		// Weird issue with ATI cards (happens on at least HD 4xxx and 5xxx),
		// it looks like they add 127/128 of a texel to sampling coordinates
		// occasionally causing point sampling to erroneously round up.
		// I'm manually adjusting coordinates to the centre of texels here,
		// though the centre is just paranoia, the top left corner works fine.
		uv = (trunc(uv * WH.zw) + vec2(0.5, 0.5)) / WH.zw;
	}

    return texture(TextureSampler, uv);
}

vec4 sample_p(float u)
{
    //FIXME do we need a 1D sampler. Big impact on opengl to find 1 dim
    // So for the moment cheat with 0.0f dunno if it work
    return texture(PaletteSampler, vec2(u, 0.0f));
}

#if 0
#else
vec4 wrapuv(vec4 uv)
{
    vec4 uv_out = uv;

    if(PS_WMS == PS_WMT)
    {
        if(PS_WMS == 2)
        {
            uv_out = clamp(uv, MinMax.xyxy, MinMax.zwzw);
        }
        else if(PS_WMS == 3)
        {
            uv_out = vec4((ivec4(uv * WH.xyxy) & ivec4(MskFix.xyxy)) | ivec4(MskFix.zwzw)) / WH.xyxy;
        }
    }
    else
    {
        if(PS_WMS == 2)
        {
            uv_out.xz = clamp(uv.xz, MinMax.xx, MinMax.zz);
        }
        else if(PS_WMS == 3)
        {
            uv_out.xz = vec2((ivec2(uv.xz * WH.xx) & ivec2(MskFix.xx)) | ivec2(MskFix.zz)) / WH.xx;
        }
        if(PS_WMT == 2)
        {
            uv_out.yw = clamp(uv.yw, MinMax.yy, MinMax.ww);
        }
        else if(PS_WMT == 3)
        {
            uv_out.yw = vec2((ivec2(uv.yw * WH.yy) & ivec2(MskFix.yy)) | ivec2(MskFix.ww)) / WH.yy;
        }
    }

    return uv_out;
}
#endif

#if 0
#else
vec2 clampuv(vec2 uv)
{
    vec2 uv_out = uv;

    if(PS_WMS == 2 && PS_WMT == 2) 
    {
        uv_out = clamp(uv, MinF, MinMax.zw);
    }
    else if(PS_WMS == 2)
    {
        uv_out.x = clamp(uv.x, MinF.x, MinMax.z);
    }
    else if(PS_WMT == 2)
    {
        uv_out.y = clamp(uv.y, MinF.y, MinMax.w);
    }

    return uv_out;
}
#endif

mat4 sample_4c(vec4 uv)
{
    mat4 c;

    c[0] = sample_c(uv.xy);
    c[1] = sample_c(uv.zy);
    c[2] = sample_c(uv.xw);
    c[3] = sample_c(uv.zw);

    return c;
}

vec4 sample_4a(vec4 uv)
{
    vec4 c;

    // Dx used the alpha channel.
    // Opengl is only 8 bits on red channel.
    c.x = sample_c(uv.xy).r;
    c.y = sample_c(uv.zy).r;
    c.z = sample_c(uv.xw).r;
    c.w = sample_c(uv.zw).r;

	return c * 255.0/256.0 + 0.5/256.0;
}

mat4 sample_4p(vec4 u)
{
    mat4 c;

    c[0] = sample_p(u.x);
    c[1] = sample_p(u.y);
    c[2] = sample_p(u.z);
    c[3] = sample_p(u.w);

    return c;
}

vec4 sample_color(vec2 st, float q)
{
    if(PS_FST == 0) st /= q;

    if(PS_TCOFFSETHACK == 1) st += TC_OffsetHack.xy;

    vec4 t;
    mat4 c;
    vec2 dd;

    if (PS_LTF == 0 && PS_FMT <= FMT_16 && PS_WMS < 3 && PS_WMT < 3)
    {
        c[0] = sample_c(clampuv(st));
    }
    else
    {
        vec4 uv;

        if(PS_LTF != 0)
        {
            uv = st.xyxy + HalfTexel;
            dd = fract(uv.xy * WH.zw);
        }
        else
        {
            uv = st.xyxy;
        }

        uv = wrapuv(uv);

        if((PS_FMT & FMT_PAL) != 0)
        {
            c = sample_4p(sample_4a(uv));
        }
        else
        {
            c = sample_4c(uv);
        }
    }

    // PERF: see the impact of the exansion before/after the interpolation
    for (int i = 0; i < 4; i++)
    {
        if((PS_FMT & ~FMT_PAL) == FMT_24)
        {
            // FIXME GLSL any only support bvec so try to mix it with notEqual
            bvec3 rgb_check = notEqual( c[i].rgb, vec3(0.0f, 0.0f, 0.0f) );
            c[i].a = ( (PS_AEM == 0) || any(rgb_check)  ) ? TA.x : 0.0f;
        }
        else if((PS_FMT & ~FMT_PAL) == FMT_16)
        {
            // FIXME GLSL any only support bvec so try to mix it with notEqual
            bvec3 rgb_check = notEqual( c[i].rgb, vec3(0.0f, 0.0f, 0.0f) );
            c[i].a = c[i].a >= 0.5 ? TA.y : ( (PS_AEM == 0) || any(rgb_check) ) ? TA.x : 0.0f;
        }
    }

    if(PS_LTF != 0)
    {
        t = mix(mix(c[0], c[1], dd.x), mix(c[2], c[3], dd.x), dd.y);
    }
    else
    {
        t = c[0];
    }

    return t;
}

#ifdef SUBROUTINE_GL40
#else
vec4 tfx(vec4 t, vec4 c)
{
    vec4 c_out = c;
    if(PS_TFX == 0)
    {
        if(PS_TCC != 0) 
        {
            c_out = c * t * 255.0f / 128.0f;
        }
        else
        {
            c_out.rgb = c.rgb * t.rgb * 255.0f / 128.0f;
        }
    }
    else if(PS_TFX == 1)
    {
        if(PS_TCC != 0) 
        {
            c_out = t;
        }
        else
        {
            c_out.rgb = t.rgb;
        }
    }
    else if(PS_TFX == 2)
    {
        c_out.rgb = c.rgb * t.rgb * 255.0f / 128.0f + c.a;

        if(PS_TCC != 0) 
        {
            c_out.a += t.a;
        }
    }
    else if(PS_TFX == 3)
    {
        c_out.rgb = c.rgb * t.rgb * 255.0f / 128.0f + c.a;

        if(PS_TCC != 0) 
        {
            c_out.a = t.a;
        }
    }

    return c_out;
}
#endif


#if 0
void datst()
{
#if PS_DATE > 0
    float alpha = sample_rt(PSin_tp.xy).a;
    float alpha0x80 = 128.0 / 255;

    if (PS_DATE == 1 && alpha >= alpha0x80)
        discard;
    else if (PS_DATE == 2 && alpha < alpha0x80)
        discard;
#endif
}
#endif

#ifdef SUBROUTINE_GL40
#else
void atst(vec4 c)
{
    float a = trunc(c.a * 255.0 + 0.01);

    if(PS_ATST == 0) // never
    {
        discard;
    }
    else if(PS_ATST == 1) // always
    {
        // nothing to do
    }
    else if(PS_ATST == 2 ) // l
    {
        if (PS_SPRITEHACK == 0)
            if ((AREF - a - 0.5f) < 0.0f)
                discard;
    }
    else if(PS_ATST == 3 ) // le
    {
        if ((AREF - a + 0.5f) < 0.0f)
            discard;
    }
    else if(PS_ATST == 4) // e
    {
        if ((0.5f - abs(a - AREF)) < 0.0f)
            discard;
    }
    else if(PS_ATST == 5) // ge
    {
        if ((a-AREF + 0.5f) < 0.0f)
            discard;
    }
    else if(PS_ATST == 6) // g
    {
        if ((a-AREF - 0.5f) < 0.0f)
            discard;
    }
    else if(PS_ATST == 7) // ne
    {
        if ((abs(a - AREF) - 0.5f) < 0.0f)
            discard;
    }
}
#endif

// Note layout stuff might require gl4.3
#ifdef SUBROUTINE_GL40
#else
void colclip(inout vec4 c)
{
    if (PS_COLCLIP == 2)
    {
        c.rgb = 256.0f/255.0f - c.rgb;
    }
    if (PS_COLCLIP > 0)
    {
        // FIXME !!!!
        //c.rgb *= c.rgb < 128./255;
        bvec3 factor = bvec3(128.0f/255.0f, 128.0f/255.0f, 128.0f/255.0f);
        c.rgb *= vec3(factor);
    }
}
#endif

void fog(vec4 c, float f)
{
    if(PS_FOG != 0)
    {
        c.rgb = mix(FogColor, c.rgb, f);
    }
}

vec4 ps_color()
{
    vec4 t = sample_color(PSin_t.xy, PSin_t.w);

    vec4 zero = vec4(0.0f, 0.0f, 0.0f, 0.0f);
    vec4 one = vec4(1.0f, 1.0f, 1.0f, 1.0f);
#if PS_IIP == 1
    vec4 c = clamp(tfx(t, PSin_c), zero, one);
#else
    vec4 c = clamp(tfx(t, PSin_fc), zero, one);
#endif

    atst(c);

    fog(c, PSin_t.z);

	colclip(c);

    if(PS_CLR1 != 0) // needed for Cd * (As/Ad/F + 1) blending modes
    {
        c.rgb = vec3(1.0f, 1.0f, 1.0f); 
    }

    return c;
}

#if GL_ES
void ps_main()
{
    vec4 c = ps_color();
    c.a *= 2.0;
    SV_Target0 = c;
}
#endif

#if !GL_ES
void ps_main()
{
#if PS_DATE == 3 && !defined(DISABLE_GL42_image)
    int stencil_ceil = imageLoad(img_prim_min, ivec2(gl_FragCoord.xy));
    // Note gl_PrimitiveID == stencil_ceil will be the primitive that will update
    // the bad alpha value so we must keep it.

	if (gl_PrimitiveID > stencil_ceil) {
		discard;
	}
#endif

    vec4 c = ps_color();

    float alpha = c.a * 2.0;

    if(PS_AOUT != 0) // 16 bit output
    {
        float a = 128.0f / 255.0; // alpha output will be 0x80

        c.a = (PS_FBA != 0) ? a : step(0.5, c.a) * a;
    }
    else if(PS_FBA != 0)
    {
        if(c.a < 0.5) c.a += 0.5;
    }

    // Get first primitive that will write a failling alpha value
#if PS_DATE == 1 && !defined(DISABLE_GL42_image)
    // DATM == 0
    // Pixel with alpha equal to 1 will failed
    if (c.a > 127.5f / 255.0f) {
        imageAtomicMin(img_prim_min, ivec2(gl_FragCoord.xy), gl_PrimitiveID);
    }
    //memoryBarrier();
#elif PS_DATE == 2 && !defined(DISABLE_GL42_image)
    // DATM == 1
    // Pixel with alpha equal to 0 will failed
    if (c.a < 127.5f / 255.0f) {
        imageAtomicMin(img_prim_min, ivec2(gl_FragCoord.xy), gl_PrimitiveID);
    }
#endif


#if (PS_DATE == 2 || PS_DATE == 1) && !defined(DISABLE_GL42_image)
    // Don't write anything on the framebuffer
    // Note: you can't use discard because it will also drop
    // image operation
#else
    SV_Target0 = c;
    SV_Target1 = vec4(alpha, alpha, alpha, alpha);
#endif

}
#endif // !GL_ES

#endif

you know, you probably should’ve posted only the actual define combination, that causes problems and cut out all irrelevant code. i highly doubt
anyone will read through this garbage(i hope you understand, why is it garbage and there’s some unavoidable reason for it to be like that, although i don’t think it is possibile to justify such thing).

I will post a shorter shader when I got some free times

Basically the only difference is an alpha test
Good shader


void atst(vec4 c)
{ }

Bad shader. AREF is an uniform.


void atst(vec4 c)
    float a = trunc(c.a * 255.0 + 0.01);
    if ((a-AREF - 0.5f) < 0.0f)
            discard;

In both case texture sampler are defined like that


layout(bindless_sampler, binding = 0) uniform sampler2D TextureSampler;
layout(bindless_sampler, binding = 1) uniform sampler2D PaletteSampler;

@gregory38, I wasn’t able to reproduce the problem you reported. I was able to compile and link your fragment shader fine and glProgramUniformHandleui64vARB appears to work correctly, at least on “TextureSampler”. I couldn’t try it with “PaletteSampler” because it appears that uniform is never referenced with PS_ATST set to either 1 or 6.

Is there any chance you’re not using the correct location value for “TextureSampler”? For me, if I compile with PS_ATST with 6 then “TextureSampler” gets a location of “1” (and not 0). However, if you try to
call glProgramUniformHandleui64vARB with a location of “0” then you’ll get “Element is invalid”.

Basically, if you change your program in any way, including something simple like changing the PS_ATST value, you need to query all the locations again, because they may have changed.

Is there any chance you’re not using the correct location value for “TextureSampler”? For me, if I compile with PS_ATST with 6 then “TextureSampler” gets a location of “1” (and not 0). However, if you try to
call glProgramUniformHandleui64vARB with a location of “0” then you’ll get “Element is invalid”.

Ok that my issue, I used the same glcall which expect a location of 0. I was wrongly expecting location to follow the layout binding semantic. I guess “binding” is linked to image unit not “bindless uniform”. Is there any way to specify a default location with layout, or is it a limitation of the extension? Might worth to add an error/warning that said “bindless sampler and binding layout property are incompatible”.

[QUOTE=gregory38;1254754] Is there any way to specify a default location with layout, or is it a limitation of the extension?[/QUOTE] I guess I can use location

The OpenGL 4.4 beta drivers for both Windows and Linux have been updated. The new Windows version is 327.24 and the new Linux version is 325.05.14. The major fix in these new drivers is to a problem with rendering to layered sparse textures and an issue with 3D sparse textures. New drivers can be found in the usual place:

Hi can you provide some info since last update on what’s new on new drivers like 327.44 and 331.40 what’s has most bug fixes i.e. recommended for development?
also I see EGL is supported now on Linux but still no full OpenGL only GL ES but seems a tegra Linux driver 334 already supports full OGL via EGL so I assume coming soon to Linux world… question is if EGL will be coming to Windows also so we can use EGL APIs for Linux and Windows as already both Intel and AMD GPU drivers have EGL Windows support (but limited to GL ES)…

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.