AMD Releases OpenGL 4.0 Drivers

So what’s your point?

And please, delete all your duplicate post, this is so ridiculous.

I bet it’s the feature, where you hint/specify the maximum deviation of the gl_FragDepth (when you’re specifying/overwriting it in the shader).
Useful to not-trash the whole Hi-Z culling compression.[/QUOTE]
The spec is now up.

Any chance of getting the spec for the GL_AMD_name_gen_delete extension?

Regards
elFarto

Interesting extension. It allows you to be able to take advantage of early depth testing as long as you follow certain rules in a shader.

I have some requests/questions:
*spec doucmentation of ext_shader_atomic_counters?
*AMD is going to ship updated 4.0 drivers shipping with these Nvidia Fermi shipping now multivendor extensions?
EXT_shader_image_load_store
EXT_vertex_attrib_64bit
some timeline?..

EXT_direct_state_access?

Nothing on that’s yet… I believe that AMD is not really into DSA… :stuck_out_tongue:

DSA DSA DSA! :stuck_out_tongue:

I agree, this is a MUCH needed feature!

Hi,
we just got our HD5870 to play with, but it really is depressing to get the ATI drivers to play nice. Some things i ran into:


glGetProgramiv(_gl_program_obj, GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH, &act_uniform_max_len);

throws invalid enum error.

Sampler objects: No gl errors when tying to bind them to a unit, but when trying to access a 3d or 1d texture using a sampler object i get black in return, accessing the same texture bypassing the sampler object using texelFetch works fine.

Then when trying to clear the depth or depth_stencil attachment of an FBO using the following, nothing happens:


glBindFramebuffer(GL_DRAW_FRAMEBUFFER, buffer_id());
// glClearBufferfv(GL_DEPTH, 0, &in_clear_depth);
glClearBufferfi(GL_DEPTH_STENCIL, 0, in_clear_depth, in_clear_stencil);

(note even if only a depth attachment is bound to the FBO the last one should work just fine).

The following does the job, but i just dont want to use it:


glBindFramebuffer(GL_DRAW_FRAMEBUFFER, buffer_id());

glClearDepth(in_clear_depth);
glClearStencil(in_clear_stencil);
glClear(GL_DEPTH_BUFFER_BIT | GL_STENCIL_BUFFER_BIT);

NOW the biggest annoyance: Loops in shaders, why in the hell does the follwing not work:

It is a gutted basic volume ray caster.


    vec4 dst = vec4(0.0, 0.0, 0.0, 0.0);
    vec4 src;

    while (inside_volume) {
        src = sampling_pos.rgb;
        src.a = 0.001;

        sampling_pos  += ray_increment;

        float omda_sa = (1.0 - dst.a) * src.a;
        dst.rgb += omda_sa*src.rgb;
        dst.a   += omda_sa;
    }
#endif
    //vec4 volume_col = texture(volume_texture, ray_entry_position);
    //vec4 color_map  = texture(color_map_texture, volume_col.r);

    out_color = dst;

I tried a for loop, a fixed loop count etc. The loop is just not evaluated.

Frustrating!

So far I haven’t seen any problems with loops on ATI hardware recently in glsl shaders (using both HD4870 and 5770). I’m using shader version 150 (if it makes any difference).

I assume you did not post the loop code you are actually trying to use because there is no way the value of `inside_volume’ will change in your loop (so either the loop is never entered, or the shader can never break out from the loop).

Hi,
no that is just a gutted shader. I am on the GL4.0 beta driver and was using “#version 330 core”.

I tried:


for (int lc = 0; lc < 1000; ++lc) {
    ....
}
out_color = dst;


int lc = 0;
while (true) {
    ++lc;
    if (lc > 1000) break;
    ...
}
out_color = dst;


bool inside_volume = true;
while (inside_volume) {
    ...
    if (dst.a > 0.9) inside_volume = false;
}
out_color = dst;

at no time the loop was executed.

what did work was:


for (int lc = 0; lc < 1000; ++lc) {
   ...
}
if (lc_c < 512) {
    out_color = vec4(0.0, 0.0, 1.0, 1.0);
}
else {
    out_color = vec4(1.0, 0.0, 0.0, 1.0);
}

result = red;

what did not:


vec4 dst = vec4(0.0);
for (int lc = 0; lc < 1000; ++lc) {
   ...
}
if (lc < 512) {
    out_color = vec4(0.0, 0.0, 1.0, 1.0);
}
else {
    out_color = vec4(dst, 1.0);
}

result = blue, i.e. not working.

-chris

Just did a quick test on my HD4870, the following both worked:


vec4 col = vec4(0.0);
int lc = 0;
while (true)
{
  ++lc;
  if (lc > 1000)
    break;

  col += vec4(0.001);
}

frag_colour = col; // output is white as expected


vec4 col = vec4(0.0);
for (int i = 0; i < 1000; ++i)
{
   col += vec4(0.001);
}

frag_colour = col; // output white as expected

In this case the GLSL compiler did not do any loop unrolling, it was significantly slower than just `frag_colour = vec4(1.0);’. So the loop seems to work properly. I’m not using the OpenGL preview driver though. However I did use that one a while ago and it worked fine with a much smaller loop (for loop, loop count 8, but much more complex). That worked fine for me (HD4870, HD5770, Linux and Windows).

Sure nothing else in the shader is incorrect? Or there is some other function in the shader that causes it not to compile properly? Any info in the shader info log?

If you don’t use anything OpenGL 3.3 / OpenGL 4.0 specific you might try AMD Gpu Shader Analyzer to see if it compiles properly in there (but that currently only supports Catalyst 10.3 I think, which supports OpenGL 3.2).

result = blue, i.e. not working.

What is “lc_c” and where does it get computed?

lc_c is a typo here in the forum and it got computed the time i pressed the wrong buttons. :wink:

tomorrow morning i have time again with the card and will look into this one more time and post a complete shader.

So,
back with the HD5780 on the GL4 beta drivers.

Here the reduced to minimum shader code:


// vertex shader ////////////////////////////////////////////////
#version 330 core

out vec3 ray_entry_position;

layout(std140, column_major) uniform;

uniform transform_matrices
{
    mat4 mv_matrix;
    mat4 mv_matrix_inverse;
    mat4 mv_matrix_inverse_transpose;

    mat4 p_matrix;
    mat4 p_matrix_inverse;

    mat4 mvp_matrix;
    mat4 mvp_matrix_inverse;
} current_transform;

layout(location = 0) in vec3 in_position;

void main()
{
    ray_entry_position = in_position;

    gl_Position = current_transform.mvp_matrix * vec4(in_position, 1.0);
}

// fragment shader //////////////////////////////////////////////

#version 330 core

in vec3 ray_entry_position;

uniform sampler3D volume_texture;
uniform sampler1D color_map_texture;

uniform vec3    camera_location;
uniform float   sampling_distance;
uniform vec3    max_bounds;

layout(location = 0) out vec4        out_color;

vec3 debug_col;

bool
inside_volume_bounds(const in vec3 sampling_position)
{
    return (   all(greaterThanEqual(sampling_position, vec3(0.0)))
            && all(lessThanEqual(sampling_position, max_bounds)));
}

void main() 
{
    vec3 ray_increment      = normalize(ray_entry_position - camera_location) * sampling_distance;
    vec3 sampling_pos       = ray_entry_position + ray_increment; // test, increment just to be sure we are in the volume

    vec3 obj_to_tex         = vec3(1.0) / max_bounds;

    vec4 dst = vec4(0.0, 0.0, 0.0, 0.0);
    vec4 src;

    bool inside_volume = inside_volume_bounds(sampling_pos);

    //unsigned int loop_c = 0u;
    while (inside_volume) {
        //loop_c += 1u;
        // get sample
        src = texture(volume_texture, sampling_pos * obj_to_tex);

        src = texture(color_map_texture, src.r);

        // increment ray
        sampling_pos  += ray_increment;
        inside_volume  = inside_volume_bounds(sampling_pos) && (dst.a < 0.99);
        // compositing
        float omda_sa = (1.0 - dst.a) * src.a;
        dst.rgb += omda_sa*src.rgb;
        dst.a   += omda_sa;
    }

    //vec4 volume_col = texture(volume_texture, ray_entry_position * 0.5);
    //vec4 color_map  = texture(color_map_texture, volume_col.r);

    //out_color = vec4(volume_col.rgb, 1.0);
    //out_color = vec4(color_map.rgb, 1.0);
    //out_color = vec4(ray_entry_position, 1.0);
    out_color = vec4(sampling_pos, 1.0);
    //out_color = dst;
}

First here the expected result when returning dst from the shader on a Nvidia board using their GL3.3/4 beta driver:

Now the first problem on ATi when using sampler objects:

Here the result when returning:
out_color = vec4(color_map.rgb, 1.0);

Now the result when using no sampler objects and setting the same state as the texture state:

So the sampler objects seem broken, i get 0 when trying to look into either texture (volume_texture, color_map_texture) when a sampler object it bound to their unit.

So and now, here the ray entry position returned:
out_color = vec4(ray_entry_position, 1.0);

and here the sample exit position:
out_color = vec4(sampling_pos, 1.0);

It is clear that the loop is not run.

So i tried the following loops:


    bool inside_volume = true;
    int loop_c = 0;
    while (inside_volume) {
        loop_c += 1;

        src = texture(volume_texture, sampling_pos * obj_to_tex);
        src = texture(color_map_texture, src.r);

        sampling_pos  += ray_increment;
        inside_volume = (loop_c < 1000);
        // compositing
        float omda_sa = (1.0 - dst.a) * src.a;
        dst.rgb += omda_sa*src.rgb;
        dst.a   += omda_sa;
    }


    for (int lc = 0; lc < 1000; ++lc) {
        src = texture(volume_texture, sampling_pos * obj_to_tex);
        src = texture(color_map_texture, src.r);

        sampling_pos  += ray_increment;

        // compositing
        float omda_sa = (1.0 - dst.a) * src.a;
        dst.rgb += omda_sa*src.rgb;
        dst.a   += omda_sa;
    }

No luck.

I will also send this to AMD to investigate.

-chris

I was just looking at your functions and I don’t see anything too funny in your main code, except I would probably do something like this:


    // vec4 src; // not necessary to be defined outside loop

    bool inside_volume = inside_volume_bounds(sampling_pos);

    //unsigned int loop_c = 0u;
    while (inside_volume) {
        //loop_c += 1u;
        // get sample and store it in a separate variable,
        // I called the variable `sample_val' instead of just
        // `sample' because I think I recall using the variable
        // name sample once and that caused the shader not to run
        // apparently `sample' is a reserved keyword in
        // ATI's glsl (it is not mentioned in GLSL spec)
        // or perhaps I recall incorrect.. but just to be sure
        float sample_val = texture(volume_texture, sampling_pos * obj_to_tex).r;

        // define src overhere, perhaps the driver does not like
        // writing to src and using it at the same time for the 
        // second lookup (it should work though...)
        vec4 src = texture(color_map_texture, sample_val);

        // increment ray
        sampling_pos  += ray_increment;
        inside_volume  = inside_volume_bounds(sampling_pos) && (dst.a &lt; 0.99);
        // compositing
        float omda_sa = (1.0 - dst.a) * src.a;

        //dst.rgb += omda_sa*src.rgb;
        //dst.a   += omda_sa;
        // might compile into fewer instructions on ATI hardware
        // (not sure though)
        dst += omda_sa * vec4(src.rgb, 1.0);
    }

With respect to your while loop you’ve tried, that still does not contain a breaking point. So that would never compile I guess (or it compiles but does not run properly).

edit: further, you might try using texelFetch instead of texture. However, texelFetch expects an ivec* as parameter and integer texture coordinates. Could be that the problems are with the texture lookup. I don’t use much texture lookups for 3D textures, so maybe there is a problem with that which I am not aware of.

The spec for GL_AMD_name_gen_delete is now up.

Regards
elFarto

Awesome! AMD seems to take OpenGL really seriously lately! Nice to see that they actually try to clean up the API and add much needed basic functionality, instead of cluttering it even more with half-baked vendor-specific features.

The name_gen_delete and debug_blablub extension should have been top-priority extensions for the ARB in GL 3.0. Nice to see AMD fix what the ARB failed to add.

Jan.

A little thing in the GLSL log messages: control and evaluation shader are refered as hull and domain shader.

The new Catalyst 10.5 now has OpenGL 3.3 and OpenGL in the mainline driver (although I believe it isn’t mentioned in the driver changelog).

OpenGL version number goes from 4.0.9826 to 4.0.9836 (with respect to the OpenGL 4.0 preview driver).

Anyone had any luck with sampler objects on Catalyst 10.5? I can get them to somewhat work. Setting filters, anisotropy and wrap modes works just fine, but the textures come out too light. It’s almost as if the sRGB->linear transformation is applied twice or something. Maybe I’m missing something…

Without sampler objects (correct):

With sampler objects: