Detecting the hardware acceleration of a shader

How can I tell if a particular shader runs on a GPU or is software-emulated?

Thank you.

Besides the likely performance loss, there’s no way to tell.

However, these days if a shader can’t run in hardware, it’ll probably just fail to compile/link rather than be emulated.

I have a complex shader and it is emulated all the time =)
What is the most common practice? Is it to benchmark each shader before using it?

I have a complex shader and it is emulated all the time

Wait… how do you tell? I mean, isn’t this thread to find out if a shader runs in software or hardware? So how do you know that it is software emulated?

Also, on what hardware are you testing this on?

Because it was running as expected before I added an additional varying. I’m absolutely sure it is running in software.
I’m testing on GeForce 8800 GTS.

Because it was running as expected before I added an additional varying. I’m absolutely sure it is running in software.

OK, so it was running “as expected”. Then you added a varying. And then… something happened that causes you to be “absolutely sure” that it’s running in software.

What happened that makes you think that?

Also, are you exceeding the limits on the number of varyings between shaders?

The performance of the shader decreased dramatically.

May possibly be true.

On Mac OS X, you can query this:

Determining Whether Vertex and Fragment Processing Happens on the GPU

Try to give more concrete information. “Dramatically” is really vague. If it takes twice or ten times as much time as previously, that might be “dramatically” for you, but it would still mean that it’s probably hw accelerated.

If it is more like 100 or 1000 times slower, then it’s possibly sw rendering. But as long as you use a fragment shader, there is usually no sw fallback possible, at all. Only vertex shaders where done in sw “back in the days”. With the possibilities of modern vertex shaders, i wouldn’t be surprised, if emulating them in sw has become unfeasible, too (e.g. vertex texture fetch is certainly not possible to emulate). And if ONE feature can’t be emulated, i’m pretty sure vendors won’t bother to emulate the pipeline stage, at all.

Jan.

In case you haven’t already tried this fairly obvious idea…

You might try reviewing the shader info log. You might find a message such as “this shader exceeds hardware limits and will be emulated in software”.

Since such a message is not required by the spec, and could change between vendors (and even releases of the same driver), you can’t count on it to detect SW fallback in release code, but it might be useful in development.

I’m pretty sure I’ve seen such a message, from an ATI X1600 driver, but I don’t remember the exact circumstance (I have done only tiny experiments with shaders).

Sure there is. Try using functions that the spec dictates must work, but HW doesn’t natively support, like noise().

Sure it is. Try using functions that the overly-orthogonal GLSL spec dictates must work, like shadow2DArrayGrad() on DX10-level AMD hardware.

The GL spec essentially says that shaders have to execute regardless of HW limits:


A shader should not fail to compile, and a program object should not fail to link
due to lack of instruction space or lack of temporary variables. Implementations
should ensure that all valid shaders and program objects may be successfully
compiled, linked and executed.

Which means software fallback is essentially required.

Try using functions that the spec dictates must work, but HW doesn’t natively support, like noise().

The general response is that they return 0. At least, that’s how NVIDIA does it.

The GL spec essentially says that shaders have to execute regardless of HW limits:

Implementations don’t have to actually follow all of the spec; just the parts people notice :wink: A shader’s failure to compile for arbitrary reasons is not something most users would notice, because most shaders don’t cross these hardware limits.

Congratulations on your 500th post, arekkusu :slight_smile:

Do you speak from experience, that they actually do so, or do you assume that they will, since the spec dictates it?

Because, AFAIK there is no emulation, because the driver would need to emulate the whole pipeline, which is simply too complex. “noise” is not implemented on NV and ATI hardware. They pretty much don’t care, what the spec says here. And if other things don’t work, then the extension is simply not exposed.

It is naive to say “shaders have to execute regardless of HW limits” because HW limits are ultimate limits! Either the GPU can do it, or it can’t, basta!

And it is usually completely useless, if it is emulated then, because if the user would want that, he wouldn’t use a GPU, but a software renderer, in the first place.

So, in general a shader will simply not work, if it exceeds the limits or is otherwise not executable. But even if it is HW accelerated, there are still reasons, why it might slow down tremendously.

Jan.

The spec says “should not” instead of “must not”, though. You can always run out of memory, after all. And if the hardware supports “large enough” shaders (e.g. 32-bit instruction pointer) and arbitrary memory access, there is no real need for a software fallback.

Yes, at least on Mac OS X.

Because, AFAIK there is no emulation, because the driver would need to emulate the whole pipeline, which is simply too complex.

I don’t mean to sound confrontational, but this is nonsense. Every hardware vendor absolutely has written a complete C simulator during silicon validation, so the complexity is well understood.

Writing a software driver is, in many ways, easier than writing a hardware driver.

“noise” is not implemented on NV and ATI hardware. They pretty much don’t care, what the spec says here. It is naive to say “shaders have to execute regardless of HW limits” because HW limits are ultimate limits! Either the GPU can do it, or it can’t, basta!

Well, this is the crux of the problem the OP has, isn’t it? The driver said it could do it (compiled and linked the shader), but it sounds like it can’t execute the result. Somewhere, a limit was exceeded, and the API is bad at telling you which one.

Not to mention preventing you from getting into that situation in the first place-- GL is a general graphics API, written by committee. There are plenty of corner cases that are not accelerated on various hardware, but those cases aren’t prevented from being written into the spec.

Given the (IMO) often over-orthogonal wording of the spec, we are left with a choice between drivers that ignore those corner cases (in other words, are broken and don’t care), or drivers with unpredictable performance cliffs.

So we have threads like this-- “why is my shader slow?”

And it is usually completely useless, if it is emulated then, because if the user would want that, he wouldn’t use a GPU, but a software renderer, in the first place.

I generally agree with you here. Some exceptional use cases might care more about the correctness/quality than the performance, but in general, everyone is using GL to take advantage of hardware acceleration.

even if it is HW accelerated, there are still reasons, why it might slow down tremendously.

Certainly. Addressing the original post again-- the OP should use a profiler to see what’s happening.

This is an excellent example of how the intention behind a carefully worded technical document can be lost in the ambiguity of English.

Writing a software driver is, in many ways, easier than writing a hardware driver.

Maybe. But the problem here is, that you start with a HW accelerated context, get a bad shader and then would need to switch to SW rendering (and maybe back to HW later on again).

THAT is an entirely different problem.

And even though a vendor has internal tools to test things, of course, that doesn’t imply that they will integrate these into a driver, because that is again a huge amount of additional work, which isn’t of any greater value to anyone.

Maybe on Mac OS it’s different, because OpenGL is a central part of its rendering pipeline, and AFAIK Apple invests much effort into the OpenGL drivers themselves, but on all other platforms there is no such emulation, at least in my experience. That might be the reason, that one can detect this case on Apple platforms. Every time i exceeded some limit on Windows or Linux, it just returned an error (or crashed).

Anyway, we seem to agree on the most parts. But without additional information from the OP, it’s not possible to say more about it.

Jan.

This is an excellent example of how the intention behind a carefully worded technical document can be lost in the ambiguity of English.[/QUOTE]
The meaning of terms like “should not” and “must not” in technical specifications is actually quite well defined.

Some things can also cause hardware to do a lot of thread scheduling and either idle the ALUs or thrash caches as they do so depending on register use. It’s not always a case of HW vs. software when you see a performance decline although software fallback is a possibility.