AMD Releases OpenGL 4.0 Drivers

  • Separate shader program and Transform feedback

Separate shader program doesn’t exist yet except as a bad extension. Simply adding this in wouldn’t be enough to make that extension better, as it would still have to deal with type conflicts.

As for transform feedback, you have a point.

(As far as my tests went it already work on nVidia for varyings :stuck_out_tongue: (Is this nVidia flexibilities where is always work when it should not good or not… is an other question!))

I think separate shader program with explicit varying locations would be a huge step forward for this extension. Does it solve everything? Maybe not.

I think separate shader program with explicit varying locations would be a huge step forward for this extension.

I think you misunderstand.

I want separation of shaders; I think it is the current low-hanging fruit in terms of OpenGL’s deficiencies. Exactly how this gets implemented is up for debate.

If we have to assign numbers to inputs and outputs to allow them to combine together, so be it. But that should be defined in the proper extension, namely the shader separation extension. And it should only be done if it is necessary. If implementations can make shader separation work based on name rather than arbitrary numbers, I would rather have that.

Without the separation of shaders, having to define input/output locations is only useful in the context of transform feedback. Asking for applying indices to inputs and outputs so that you can have shader separation is putting the cart before the horse; we want shader separation, and if implementing it requires numbering inputs and outputs, then so be it.

Yes, the cart and the horse come together.

But if all you need is a horse, why ask for a cart?

What I need is a cart.

What I need is a cart.

Again, I ask the question, why? If separate shader doesn’t need this, why do you? Besides transform feedback, of course.

When I create a core profile with with catalyst 10.3 glVertexAttribPointer method fails with GL_INVALID_OPERATION.
If i change the context creationg attribs to this :
int attribs[] =
{
WGL_CONTEXT_MAJOR_VERSION_ARB, major,
WGL_CONTEXT_MINOR_VERSION_ARB, minor,
WGL_CONTEXT_FLAGS_ARB, 0,
WGL_CONTEXT_PROFILE_MASK_ARB, WGL_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB,
0
};

Everything works I guess its a driver bug in 10.3 I haven’t tried it with the OGL 4.0 preview drivers yet.
The code is at http://glbase.codeplex.com if anyone wants to give it a try.

When I create a core profile with with catalyst 10.3 glVertexAttribPointer method fails with GL_INVALID_OPERATION.

Did you create a VAO? 3.1 core and above requires having a bound VAO before gl*Pointer will work.

With an NVidia OpenGL 3.3 core context, you don’t get a GL_INVALID_OPERATION when calling glVertexAttribPointer with no VAO bound.
Not sure whether it’s just a mistake in the appendices that the default VAO is marked as removed, or whether you really need to create your own “default” VAO (one you bind at start + forget about).

There’s several other reasons that glVertexAttribPointer can cause GL_INVALID_OPERATION (from spec):
[ul][]size is BGRA and type is not UNSIGNED_BYTE, INT_2_10_10_10_REV or UNSIGNED_INT_2_10_10_10_REV;[]type is INT_2_10_10_10_REV or UNSIGNED_INT_2_10_10_10_REV, and size is neither 4 or BGRA;[]for VertexAttribPointer only, size is BGRA and normalized is FALSE;[]any of the *Pointer commands specifying the location and organization of vertex array data are called while zero is bound to the ARRAY_BUFFER buffer object binding point (see section 2.9.6), and the pointer argument is not NULL.[/ul]

With an NVidia OpenGL 3.3 core context, you don’t get a GL_INVALID_OPERATION when calling glVertexAttribPointer with no VAO bound.

That’s off-spec behavior, and should probably be fixed in NVIDIA’s drivers. The core specification removes the default VAO.

I think nVidia love off-spec behavior and it won’t be fixed.

I was drawing with glDrawArrays and I am not using a VAO should I create a VAO and use that to fix this? I’m going to give it a try.

Yes, VAOs are required…

Just adding
unsigned int vao = 0;
glGenVertexArrays(1,&vao);
glBindVertexArray(vao);

Fixed the problem thanks.

Any hope of having “precise” qualifier outside of GPU_EXT_shader5 extension? it’s not a gpu feature only a compiler fature…
i.e. for not only fermi and cypress gpus I want in gt200 for example…
it’s not good since double precision emulation on d3d10 gpus using
float-float approaches gets optimized by Nvidia compiler!
Example code optimized:

vec2 dblsgl_add (vec2 x, vec2 y)
{
precise vec2 z;
float t1, t2, e;

t1 = x.y + y.y;
e = t1 - x.y;
t2 = ((y.y - e) + (x.y - (t1 - e))) + x.x + y.x;
z.y = e = t1 + t2;
z.x = t2 - (e - t1);
return z;
}

vec2 dblsgl_mul (vec2 x, vec2 y)
{
precise vec2 z;
float up, vp, u1, u2, v1, v2, mh, ml;

up = x.y * 4097.0;
u1 = (x.y - up) + up;
u2 = x.y - u1;
vp = y.y * 4097.0;
v1 = (y.y - vp) + vp;
v2 = y.y - v1;
//mh = __fmul_rn(x.y,y.y);
mh = x.y*y.y;
ml = (((u1 * v1 - mh) + u1 * v2) + u2 * v1) + u2 * v2;
//ml = (fmul_rn(x.y,y.x) + __fmul_rn(x.x,y.y)) + ml;

ml = (x.yy.x + x.xy.y) + ml;

mh=mh;

z.y = up = mh + ml;
z.x = (mh - up) + ml;
return z;
}

I don’t see why precise could not be supported by OpenGL 3 hardware and it’s actually one of the features of GLSL 4.0 that could be bring to GLSL 3.4.

However, “precise” as nothing to do with double float. double float are part of GL_ARB_shader_fp64 and should be supported by Radeon HD 48** / 47** and GeForce GTX 2**.

Sorry I mean float-float approaches of using 2 floats for having near double precision…
search google for it…

see the code: is a float-float implementation for
a mandelbrot
and the program:
http://dl.dropbox.com/u/1416327/mandeldouble.rar

Sorry for posting so many…
above executable contains:
*uses gl_arb_gpu_shader5 in a float-float implementation with precise keyword for fixing agressive Nvidia compiler
*uses arg_gpu_shader_FP64 with doubles… and fallbacks to doublepAMD on catalyst no ogl 4.0 drivers…
*normal mandelbrot implementation

on AMD 5850 with 1920x1080 res ati gl 4.0 drivers
I obtain:
*13fps using float-float approach…
*50fps using doubles with ati gl 4.0 drivers
*130fps using single precision
Note pre GL 4.0 drivers using doublepAMD attain 36fps on double precision now gl 4.0 drivers either doublepAMD or double attain 50fps…
You can deduce Gflop/s seeing glsl code… it’s very high…