Beginning development: GLSlang to ARB_vp and ARB_fp compiler

Due to the yet weak implementations of GLSL, but also to provide a source for SW rasterizer implementors, I’m going to develop a GLSlang to ARB_vertex_program and ARB_fragment_program compiler. This will also include a replacement for opengl32.dll (Win32) and libGL.so (Unix alikes) to enable builtin-uniform (e.g. gl_LightSource) tracking.

The replacement library will pass through most commands directly, but will replace all Commands, affecting the uniform trackers not already provided by the old ARB_vertex_program and ARB_fragment_program extensions. e.g. Lights.

I’ll also hook into GetProcAddress to provide it’s own implementation of ARB_vertex_shader, ARB_fragment_shader and ARB_shader_objects.

Any help is highly appreciated, especialy in the field of bytecode optimizations, instruction grouping and optimized loop unrolling.

I also registerd it as a project as SourceForge.

Instead of doing that hackery of passing calls through a .dll/.so. If I were you I’d implement GLSL in Mesa.

Of course, first you need to fix the bugs in MESA.

And it’s pixelformat setup code needs some work too. By default, it gives you 16 bit depth.
Alpha wasn’t working for me either. Too many problems.

What’s the project name on sourceforge? Is it this one? http://sourceforge.net/projects/glslang/

The project’s name is
OpenGL Shading Language Meta Compiler
The UNIX Name is glslmc

Originally posted by PK:
Instead of doing that hackery of passing calls through a .dll/.so. If I were you I’d implement GLSL in Mesa.

I always planned this, but this won’t help if you want HW support. Although some parts of GLSL cannot be implemented, flow control and access to texture data from vertex shaders, the most important features can be implemented.

I got permission from 3D Labs to derive from their generic GLSL compiler, mainly the flex and bison files. All details about the planned implementation will be avaliable on the homepage ASAP. I’ll need the weekend to create the website.

There I’ll post some translation tables
ASM <=> GLSL

I’m interested in seeing this.

Flow control:
I think if the compiler can unroll your loop, it would be ok. If statements can be simulated as well.

for(i=0; i<5; i++)
{…}

PS: On my ATI card, Cat 4.1, it says

Internal compiler error:loop constructs not yet implemented.

Interesting!

Originally posted by V-man:
[b]I’m interested in seeing this.

Flow control:
I think if the compiler can unroll your loop, it would be ok. If statements can be simulated as well.

for(i=0; i<5; i++)
{…}

PS: On my ATI card, Cat 4.1, it says

Internal compiler error:loop constructs not yet implemented.

Interesting![/b]

And for Fragment Shader Objects you’ll have to get the next HW generation. Flow control in the Radeon 9800 series is only implemented in silicon for the vertex shader, but when I asked devrel’at’ati_com I got the awnser, that the fragment processor only has that fragment discard capability and nothing more. Well I think that the silicon structures are there. At most flow control it boils down to a conditional code address register change.

At most flow control it boils down to a conditional code address register change.

Lol!

Maybe in vertex programs, but in fragment programs, flow control means a lot more.

Look at it currently, on R300 hardware. You have 8 pipes. They are all executing the same instruction at the same time. So, really, what you have are 8 computational units being driven off of 1 instruction processor. SIMD: Single-Instruction, Multiple Data. One instruction drives 8 4-vector floating-point operations, as well as 8 texture operations.

Also, that texture operation is designed and optimized to return 8 texels, one for each pipeline. This makes texturing much faster, unless you turn off mipmapping.

To do what you are asking, they would have to give each pipe it’s own instruction counter in the current program. Texturing would no longer be able to return 8 texels; it would have to return 1. This means that each pipe would have to have its own texture unit.

There are any number of other reasons why this way of getting conditional branches is a bad idea. But these are a good start.

R420 may (and likely will) have conditional branching. But it won’t be implemented the way CPU’s do it, and there could easily be a big performance hit for using it.

Hmm, seems that you’re right about the capabilites of current GPUs. Thanks for the reply.

Originally posted by Korval:
[b] Lol!

Maybe in vertex programs, but in fragment programs, flow control means a lot more.

Look at it currently, on R300 hardware. You have 8 pipes. They are all executing the same instruction at the same time. So, really, what you have are 8 computational units being driven off of 1 instruction processor. SIMD: Single-Instruction, Multiple Data. One instruction drives 8 4-vector floating-point operations, as well as 8 texture operations.

There are any number of other reasons why this way of getting conditional branches is a bad idea. But these are a good start.[/b]

Don’t see the problem here. SIMD works great in the modern x86 Processors, and you can perfectly loop SSE instructions. These perform SIMD on 4 element float registers. On a GPU there are eight 4 element float registers, but the idea behind both is the same.

I already can guess your reply:

But for different loop counts on the fragments the registers must be independent, wheres SIMD on x86 only works on a whole register set.

Well this may be true, but the solution to this is to lock registers from being changed at critical branching points. It’s a common technique on x86 solved in SW, and on a GPU this can be done even more efficiently on HW.

The question is still: How much of this exists in the existing GPUs?

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.