AMD Releases OpenGL 4.0 Drivers

Alfonse_Reinheart · March 28, 2010, 6:10pm

Separate shader program and Transform feedback

Separate shader program doesn’t exist yet except as a bad extension. Simply adding this in wouldn’t be enough to make that extension better, as it would still have to deal with type conflicts.

As for transform feedback, you have a point.

imported_Groovounet · March 28, 2010, 6:19pm

(As far as my tests went it already work on nVidia for varyings (Is this nVidia flexibilities where is always work when it should not good or not… is an other question!))

I think separate shader program with explicit varying locations would be a huge step forward for this extension. Does it solve everything? Maybe not.

Alfonse_Reinheart · March 28, 2010, 10:56pm

I think separate shader program with explicit varying locations would be a huge step forward for this extension.

I think you misunderstand.

I want separation of shaders; I think it is the current low-hanging fruit in terms of OpenGL’s deficiencies. Exactly how this gets implemented is up for debate.

If we have to assign numbers to inputs and outputs to allow them to combine together, so be it. But that should be defined in the proper extension, namely the shader separation extension. And it should only be done if it is necessary. If implementations can make shader separation work based on name rather than arbitrary numbers, I would rather have that.

Without the separation of shaders, having to define input/output locations is only useful in the context of transform feedback. Asking for applying indices to inputs and outputs so that you can have shader separation is putting the cart before the horse; we want shader separation, and if implementing it requires numbering inputs and outputs, then so be it.

imported_Groovounet · March 29, 2010, 2:43am

Yes, the cart and the horse come together.

Alfonse_Reinheart · March 29, 2010, 2:50am

But if all you need is a horse, why ask for a cart?

imported_Groovounet · March 29, 2010, 4:27am

What I need is a cart.

Alfonse_Reinheart · March 29, 2010, 10:47am

What I need is a cart.

Again, I ask the question, why? If separate shader doesn’t need this, why do you? Besides transform feedback, of course.

Black_Knight · March 30, 2010, 1:25pm

When I create a core profile with with catalyst 10.3 glVertexAttribPointer method fails with GL_INVALID_OPERATION.
If i change the context creationg attribs to this :
int attribs[] =
{
WGL_CONTEXT_MAJOR_VERSION_ARB, major,
WGL_CONTEXT_MINOR_VERSION_ARB, minor,
WGL_CONTEXT_FLAGS_ARB, 0,
WGL_CONTEXT_PROFILE_MASK_ARB, WGL_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB,
0
};

Everything works I guess its a driver bug in 10.3 I haven’t tried it with the OGL 4.0 preview drivers yet.
The code is at http://glbase.codeplex.com if anyone wants to give it a try.

Alfonse_Reinheart · March 30, 2010, 1:36pm

When I create a core profile with with catalyst 10.3 glVertexAttribPointer method fails with GL_INVALID_OPERATION.

Did you create a VAO? 3.1 core and above requires having a bound VAO before gl*Pointer will work.

danbartlett · March 30, 2010, 2:01pm

With an NVidia OpenGL 3.3 core context, you don’t get a GL_INVALID_OPERATION when calling glVertexAttribPointer with no VAO bound.
Not sure whether it’s just a mistake in the appendices that the default VAO is marked as removed, or whether you really need to create your own “default” VAO (one you bind at start + forget about).

There’s several other reasons that glVertexAttribPointer can cause GL_INVALID_OPERATION (from spec):
[ul][]size is BGRA and type is not UNSIGNED_BYTE, INT_2_10_10_10_REV or UNSIGNED_INT_2_10_10_10_REV;[]type is INT_2_10_10_10_REV or UNSIGNED_INT_2_10_10_10_REV, and size is neither 4 or BGRA;[]for VertexAttribPointer only, size is BGRA and normalized is FALSE;[]any of the *Pointer commands specifying the location and organization of vertex array data are called while zero is bound to the ARRAY_BUFFER buffer object binding point (see section 2.9.6), and the pointer argument is not NULL.[/ul]

Alfonse_Reinheart · March 30, 2010, 2:12pm

With an NVidia OpenGL 3.3 core context, you don’t get a GL_INVALID_OPERATION when calling glVertexAttribPointer with no VAO bound.

That’s off-spec behavior, and should probably be fixed in NVIDIA’s drivers. The core specification removes the default VAO.

imported_Groovounet · March 30, 2010, 2:32pm

I think nVidia love off-spec behavior and it won’t be fixed.

Black_Knight · March 30, 2010, 2:45pm

I was drawing with glDrawArrays and I am not using a VAO should I create a VAO and use that to fix this? I’m going to give it a try.

imported_Groovounet · March 30, 2010, 2:51pm

Yes, VAOs are required…

Black_Knight · March 30, 2010, 4:38pm

Just adding
unsigned int vao = 0;
glGenVertexArrays(1,&vao);
glBindVertexArray(vao);

Fixed the problem thanks.

oscarbg · April 6, 2010, 10:23am

Any hope of having “precise” qualifier outside of GPU_EXT_shader5 extension? it’s not a gpu feature only a compiler fature…
i.e. for not only fermi and cypress gpus I want in gt200 for example…
it’s not good since double precision emulation on d3d10 gpus using
float-float approaches gets optimized by Nvidia compiler!
Example code optimized:

vec2 dblsgl_add (vec2 x, vec2 y)
{
precise vec2 z;
float t1, t2, e;

t1 = x.y + y.y;
e = t1 - x.y;
t2 = ((y.y - e) + (x.y - (t1 - e))) + x.x + y.x;
z.y = e = t1 + t2;
z.x = t2 - (e - t1);
return z;
}

vec2 dblsgl_mul (vec2 x, vec2 y)
{
precise vec2 z;
float up, vp, u1, u2, v1, v2, mh, ml;

up = x.y * 4097.0;
u1 = (x.y - up) + up;
u2 = x.y - u1;
vp = y.y * 4097.0;
v1 = (y.y - vp) + vp;
v2 = y.y - v1;
//mh = __fmul_rn(x.y,y.y);
mh = x.y*y.y;
ml = (((u1 * v1 - mh) + u1 * v2) + u2 * v1) + u2 * v2;
//ml = (fmul_rn(x.y,y.x) + __fmul_rn(x.x,y.y)) + ml;

ml = (x.yy.x + x.xy.y) + ml;

mh=mh;

z.y = up = mh + ml;
z.x = (mh - up) + ml;
return z;
}

imported_Groovounet · April 6, 2010, 10:42am

I don’t see why precise could not be supported by OpenGL 3 hardware and it’s actually one of the features of GLSL 4.0 that could be bring to GLSL 3.4.

However, “precise” as nothing to do with double float. double float are part of GL_ARB_shader_fp64 and should be supported by Radeon HD 48** / 47** and GeForce GTX 2**.

oscarbg · April 6, 2010, 7:40pm

Sorry I mean float-float approaches of using 2 floats for having near double precision…
search google for it…

oscarbg · April 6, 2010, 7:48pm

see the code: is a float-float implementation for
a mandelbrot
and the program:
http://dl.dropbox.com/u/1416327/mandeldouble.rar

oscarbg · April 6, 2010, 7:54pm

Sorry for posting so many…
above executable contains:
*uses gl_arb_gpu_shader5 in a float-float implementation with precise keyword for fixing agressive Nvidia compiler
*uses arg_gpu_shader_FP64 with doubles… and fallbacks to doublepAMD on catalyst no ogl 4.0 drivers…
*normal mandelbrot implementation

on AMD 5850 with 1920x1080 res ati gl 4.0 drivers
I obtain:
*13fps using float-float approach…
*50fps using doubles with ati gl 4.0 drivers
*130fps using single precision
Note pre GL 4.0 drivers using doublepAMD attain 36fps on double precision now gl 4.0 drivers either doublepAMD or double attain 50fps…
You can deduce Gflop/s seeing glsl code… it’s very high…