Difference between execution compute shader on NVidia and AMD GPU

RomualdVII · March 26, 2024, 9:01pm

Good afternoon!

I wrote a computational shader that executes correctly on NVidia graphics cards (GTX 1660 Ti/Win 10, RTX 4070/Ubuntu 22 04 and RTX 4080/Win 11 - tried it on these configurations).

On AMD graphics cards, on the other hand, my program crashes while calling “dispatchCompute” on my shader after a while (tested on RX 580/Ubuntu 22 04 and RX 5700/Win 10), from which I conclude that something is wrong with AMD drivers.

Unfortunately, my shader is complex and 2050 lines long, so I can’t cite it in its entirety. I wanted to put together a minimal example, but it works correctly, so the problem is something I’m not thinking about when putting together the example.

In a nutshell, inside the shader there is a function call containing a while loop that stops by condition. Inside the while loop a lot of multilevel functions are called, but never recursive functions.

I had an idea related to the fact that the while loop is never interrupted. I took a test data set and replaced the while loop with a for loop, because I know in advance exactly how many iterations to make. Unfortunately, nothing changed - the program crashes.

Since, as I have already mentioned, there are VERY many calls to my other functions inside my loop, I have an idea whether the operation stack is overflowing, but I don’t know how I can check it.

Any help would be appreciated!

Dark_Photon · March 27, 2024, 5:05pm

This isn’t a good assumption. That could be. But if your implementation is making use of undefined behavior, or behavior supported by NVIDIA and not AMD, then this could be the result.

In my experience, NVIDIA drivers tend to be more resilient in the face of undefined behavior, whereas AMD is more likely to produce artifacts or crash.

You’re just going to have to chase this one down. Re “there are VERY many calls to my other functions inside my loop”, passing complex code to the GPU is usually not a good idea, for performance or maintainability. It’s not a CPU.

RomualdVII · March 28, 2024, 2:08pm

Thank you for your reply!
Can you please tell me if a shader program running on AMD can fail if there are a lot of branches?
For example, here is a part of my code

struct structure1 {
    ivec3 field1;
    double field2;
};

structure1 conditionalSearch(int A, dvec2 B, ivec2 C, ivec2 D, bool condition)
{
    structure1 st;
    double res = 0.0lf;

    if (C == ivec2(-1, 0) || D == ivec2(-1, 1))
    {
        if (condition)
        {
            if (!(C == D) && B[1] >= 0)
            {
                res = doSomeCalc(B, C, A);
                if (res > 0.0)
                {
                    st.field1 = ivec3(C, 1);
                    st.field2 = res;
                    return st;
                }
            }
            else if (!(...) && ... <= 0)
            {
                res = doSomeCalc(B, D, A);
                if (res > 0.0)
                {
                    st.field1 = ivec3(D, 1);
                    st.field2 = res;
                    return st;
                }
            }
           ...

At the moment I see this behavior: the program executes correctly if I call conditionalSearch() once in the main() function of the shader, and crashes if I try to call it at least once more (with the same parameters).

Alfonse_Reinheart · March 28, 2024, 3:30pm

That’s not “a lot of branches”.