OpenGL 3.0 and GPGPU

At Siggraph, I attended both the OpenGL BOF and the ATI low level GPGPU (CTM) talk with great interest. What I didn’t hear is how OGL 3.0 might address GPGPU needs.

It seems like a few additions might make it covers many of the GPGPU needs:

  • A new GLSL shader type that allows scattered writes
  • A simple way to launch a multitude of these shader instances without drawing polygon(s)

The ATI talk also said there was no way in CTM to coordinate between the shader instances beyond a global sync. A richer synchronization/coordiantion mechanism would help many GPGPU algorithms, but might need wait for a future OGL extension.

A new GLSL shader type that allows scattered writes
This is not GPGPU but CPU :smiley:

I agree with ZbuffeR. The new features (geometry programs, more robust fragment programs) of new cards combined with the new object model seems to be everything that you could ever really expect out of a graphics API for you to do non-graphics work. “Guarenteed” fast path, better/more robust image/FBO support, etc.

At the end of the day, a GPU still needs to be a GPU; it’s not going to become “arbitrary coprocessor” anytime soon. GPGPU needs to learn to live within the restrictions of that context, understanding that they are yoking hardware to do stuff that it is not designed to do.

GPGPU needs to learn to live within the restrictions of that context, understanding that they are yoking hardware to do stuff that it is not designed to do.
And the reason it’s designed that way is because games are footing the bill for R&D. If GPGPU applications start making some big bucks for the IHVs, things will likely change. I thought I’d toss this thought in as an ancillary and arguably impertinent perspective :wink:

A simple way to launch a multitude of these shader instances without drawing polygon(s)
What exactly do you mean by that?

Isn’t that similar to geometry shader introduced to DX10?

glAren, no, see Korval post about it.

Originally posted by Leghorn:
And the reason it’s designed that way is because games are footing the bill for R&D. If GPGPU applications start making some big bucks for the IHVs, things will likely change. I thought I’d toss this thought in as an ancillary and arguably impertinent perspective :wink:
Yea but the problem is that then it will stop being a GPU and it will become more of a second CPU.
The whole point about a GPU is that it can’t do scatter that well, and thus it does not have to worry about pesky little things like race condition or symetric two way memory bandwidth.
Suprisingly some applications can use the strengths and weaknesses of the GPU in a more effective way than on the CPU, you just need to think a bit more on how to do it.

About openGL 3.0, One of the features i have heard of for at least openGL LM is that most things will be done in the shader, even depth testing and blending.
This means that the shader must have knowledge (depth, color, alpha and so on) about the pixel it is currently writing to, this is a great feature for all the GPGPU (and graphics) guys out there.
But other than that, i think the FBO and PBO extentions did more to the GPGPU crowd than any features currently on the horizon.

Yea but the problem is that then it will stop being a GPU and it will become more of a second CPU.
The whole point about a GPU is that it can’t do scatter that well, and thus it does not have to worry about pesky little things like race condition or symetric two way memory bandwidth.
Hehe. Well, to me the whole point of the GPU is that it’s fast, that it does computations quickly. That it’s like or dislike a “CPU” to me is stuff for coffee table books, where you might find phrases like “… race conditions and symmetric two way memory bandwidth …” :slight_smile:

Make no mistake: I’m no proponent for GPGPU, nor am I naysayer. Like, I just want to hear the man out. Heck, I’m not even sure what this topic is about. I’ve been planting tomatoes all day.

This means that the shader must have knowledge (depth, color, alpha and so on) about the pixel it is currently writing to, this is a great feature for all the GPGPU (and graphics) guys out there.
It’s not going to be what you think.

Note that it is in “a shader”, not “the shader”. As in, not the fragment shader.

What was suggested (and it may not even appear in 3.0; it may wait for hardware that can handle it, which could be 3.1 or so) was that they have a “blend shader” that specifies depth tests, FB blending, and so forth. The initial iterations of this shader will certainly be limitted to our current expressitivity. So don’t expect to be programming complicated blend operations.

As hardware advances, this shader stage may improve, but it’s probably never (for a good while, at least) going to be as efficient or powerful as fragment programs. I wouldn’t expect to see looping or large numbers of uniforms in this stage, for example.

That it’s like or dislike a “CPU” to me is stuff for coffee table books, where you might find phrases like “… race conditions and symmetric two way memory bandwidth …” :slight_smile:
You mean substantive, techical texts that discuss facts rather than just handwaving?

You mean substantive, techical texts that discuss facts rather than just handwaving?
Exactly. Though I’ve come across some good books on handwaving, like Handwaving: priciples and practices, and Learn Handwaving in 30 days, or my favorite, Handwaving for Dummies.

Originally posted by Korval:

What was suggested (and it may not even appear in 3.0; it may wait for hardware that can handle it, which could be 3.1 or so) was that they have a “blend shader” that specifies depth tests, FB blending, and so forth. The initial iterations of this shader will certainly be limitted to our current expressitivity. So don’t expect to be programming complicated blend operations.

Well, that’s stupid, a new shader type is the last thing we need (and yes i do think the vertex and geometry shader should be merged, that is if they didn’t paint themselves in a corner this time with it), if the fragment shader instead learned the data contained in the current pixel no blendshader would be needed, and since that functionality would still be needed in a blendshader i can’t see the point of having one.

I seem to recall when GLSL came out, and under the issues section where a question(#23 in the 1.10.59 GLSL spec) about this kind of feature, they decided not to because

There is too much concern about impact to performance and impracticallity of implementation.
But it does suggest that the issue will be revisited as an extention in the future (and that it is actually possible to do).

Originally posted by ZbuffeR:
[quote]A new GLSL shader type that allows scattered writes
This is not GPGPU but CPU :smiley:
[/QUOTE]Cool. Then the ATI X19000 is a CPU, since according to their talk at Siggraph, it can do scattered writes. (BTW, it’s not a CPU)

I think it was Neil Trevett (then at 3DLabs) at Siggraph03 that was frothing that what we really need are essentially vector processing units, since not everythinging in graphics is a lit & textured triangle. I agree with that.

Originally posted by zeoverlord:
The whole point about a GPU is that it can’t do scatter that well, and thus it does not have to worry about pesky little things like race condition or symetric two way memory bandwidth.
Suprisingly some applications can use the strengths and weaknesses of the GPU in a more effective way than on the CPU, you just need to think a bit more on how to do it.

Well now, an IIR filter would be a nice case to look at. You can do it with current fragment shaders, but it’s a pain to write and an order of magnitude slower than what the hardware should be able to do, were it not constrained by the current API. And that’s without needing race condition resolution or symtric two way memory bandwidth.

Originally posted by RayT:
Well now, an IIR filter would be a nice case to look at. You can do it with current fragment shaders, but it’s a pain to write and an order of magnitude slower than what the hardware should be able to do, were it not constrained by the current API. And that’s without needing race condition resolution or symtric two way memory bandwidth.
Now i can understand that, a IIR filter is not that optimal for GPGPU because it does things in a way that is non optimal to a GPU, IIR doesn’t like the paralellism in the GPU and the GPU does not like the linearity of a IIR filter.
So even if the API restrictions would magically dissapear, it would still only run on one pipeline (though you could run several at once).
Though IIR would run really fast on cells streaming APUs.

if the fragment shader instead learned the data contained in the current pixel no blendshader would be needed, and since that functionality would still be needed in a blendshader i can’t see the point of having one.
Yes, and if pigs flew, we would have flying pigs.

I prefer efficient operations, not impractical wishlists. IHVs have said, and it’s pretty hard to discount this, that reading from the framebuffer into a fragment shader would be difficult if not impossible to implement. It would impact performance drastically, and thus it would be impractical to implement.

Just because you can invision a paradise where there’s only one shader (and personally, I don’t see that as a paradise. And also personally, if you do, feel free to pretend that you have that when writing your shaders) doesn’t make it a good idea.

Originally posted by Korval:
Yes, and if pigs flew, we would have flying pigs.
We allready have pigs that glow in the dark, so it’s not if pigs could fly, but when.
It’s the same with GFX harware.

And no i don’t like one shader, two will do, 4 is stretching it a bit, what’s next, 6?, no thanks, keeping it simple but powerful is often the best way.
If we are going to add more shaders, we should first check if it can be done by extending existing ones.

If we are going to add more shaders, we should first check if it can be done by extending existing ones.
Yes, and it can’t, so we add new shaders. It’s that simple.

The geometry shader logic is fundamentally incompatible with vertex shader logic. The VS wants one vertex in, one vertex out. Geometry shaders can take multiple post-T&L vertices in, and multiple post-T&L vertices out. It can write to buffers and stuff, things that a VS can’t do.

If you were to combine geometry and vertex shaders into one shader, how would you go about defining the difference between what one part can do and the other can’t? How do you go about defining where one part of the shader begins and the other ends?

You cannot easily (or at all) abstract away the differences between vertex and geometry shaders. It’s much easier and more reasonable to just place it in the pipeline as a third kind of shader.

As for blending shaders, once again, there are things that these will not be able to do. Texturing, for example, would likely not be an available construct in a blend shader. So how do you communicate that restriction without making a new shader type.

If someday these hardware restrictions go away, such that vertex and geometry shaders become one piece of hardware, we as developers need know nothing about it. Maybe the ARB defines a new kind of shader that combines them both, for a lower-level implementation. But that can come later as an extension. The same with blend and fragment shaders.

The number of different kinds of shaders is, to me, totally irrelevant. If we wind up with ten, so be it. All I’m interested in is whether or not these shaders do something of value and have reasonable performance.

Originally posted by zeoverlord:
[IIR]… it would still only run on one pipeline (though you could run several at once).
Though IIR would run really fast on cells streaming APUs.

Fortunately, the IIRs for different scanlines are independent, so running several at once is a natural thing to do.

Originally posted by Korval:
The geometry shader logic is fundamentally incompatible with vertex shader logic. The VS wants one vertex in, one vertex out. Geometry shaders can take multiple post-T&L vertices in, and multiple post-T&L vertices out. It can write to buffers and stuff, things that a VS can’t do.
No it’s not, the vertex shader syntax/logic only needs minor modifications to allow this.
such as gl_Vertex -> gl_Vertex[n]
And on some ATI cards DX vertex shaders already can write to buffers by using R2VB.
So it’s not impossible or impractical, just different.
But as i said previously, this depends mostly on the hardware implementation.

But blendshaders, i don’t know, will IHVs sacrifice space on the chip just to add another shader stage, space they could use for more pixel pipelines, or will they just layer it in the fragment shader, seeing as they are probably allready doing this to some extent.

No it’s not, the vertex shader syntax/logic only needs minor modifications to allow this.
And you know this, of course. You’re a hardware engineer. Right?

Because otherwise, you’d just be making some (incredibly dubious) assumptions.

will they just layer it in the fragment shader
You say that as though it’s like flipping a switch or something. Before you start chastizing the decisions of highly intelligent and ingenious people, perhaps you could consider learning about how hardware works. Then, after you have some real knowledge of the subject, perhaps you can then lecture them about which takes up more transistors and poses more/fewer potential race conditions.