No more mapping layers, or arbitrary ones

knackered · October 8, 2008, 8:37am

I thought of the linking twice approach and dismissed it as not worth mentioning. Linking is already stupidly slow, and without binary blobs you’ve just doubled the shader setup time - and that’s the last thing we need if using a JIT shader system.
If we could just re-bind attribute locations after linkage, this wouldn’t be a problem.

T101 · October 8, 2008, 8:43am

Well, I guess a facility to retrieve the attribute names referenced in a compiled shader before linking would also do it, wouldn’t it?

knackered · October 8, 2008, 9:01am

not if attributes and functions using those attributes were declared in a utility shader source, but that utility shader code was never used in the final shader program due to maybe some constant-controlled logic. The attributes are then not ‘active’, so updating them (or doing any CPU work to calculate them) would be a waste of time. Even multiplying the attribute by something that evaluates to zero gets caught in the linking stage - the attribute is then eliminated and updating it is once again a waste of time.

Currently my shader system binds a linked shader, which in turn sets its attribute locations in a global hash table, then my mesh class binds its attributes to those locations using a hash lookup. This works fine at the moment (leaving the shader to decide the attribute locations) but once I start using VAO this will break.

Ilian_Dinev · October 8, 2008, 9:03am

Now, this all would be easy if you used VBOs instead. VAO = baked data, afaik (I don’t touch Macs). VBO = data, that you describe to the driver before a draw-call.

compile+link the shader
get “int AttribID[16]” and “char* AttribName[16]” for the shader
bind shader, bind VBO.
foreach(attrib in mesh){ int idx = pShader->GetAttribID(attrib.Name); if(idx!=-1) glVertexAttribPointer(idx,…); }
perform draw-call.

Done!!!
You can’t ask for flexibility from a baked thing. You are willing to sacrifice speed on redefining attrib descriptions, so go to VBOs.

knackered · October 8, 2008, 9:10am

do you realise why VAO are being introduced? setting up attribute ‘pointers’ is very expensive. VAO is being introduced to make it quicker to switch between batches. Korval’s right, VAO won’t work elegantly with the current GLSL attribute binding system we have.
I’d love to see the cycles your drawing code burns up on a scene with 100,000 objects in it.
For information on something like it should be done, take a look at the DX10 documentation, specifically the input layout bit:
http://msdn.microsoft.com/en-gb/library/bb172486(VS.85).aspx

Ilian_Dinev · October 8, 2008, 9:36am

Yes, but I’m not the one asking to sacrifice a bit more performance in releases for the sake of flexibility in a model-viewer :P.
The attrib registers are there, and will stay there, as external environmental data. By knowing that, we can easily do some optimizations.
Moreover, in games you don’t just splat any shader as an effect on a character/item.

@Korval: About the gl_Vertex,gl_Normal thing - I didn’t mean to use them. I meant to make-up your own standard names and their default binding attribID. GLSL1.3/GL3 goes away from caching of compiled shaders, does that mean it’s still heading in a good direction? Oh, and iirc does nothing much for updates of multiple uniforms quickly. Also making point-sprites and wide-lines deprecated, while not enforcing ATi to make geom shaders. (Also, in these two features hardware can/does have simple circuitry to compute+interpolate+copy 16*4 varyings at once, instead of making your geom-shader waste 64 cycles). So, I’m not gung-ho about GL3 and its ways yet, studying design from it doesn’t look like a good source.

Ilian_Dinev · October 8, 2008, 9:41am

As I mentioned above,… 8-15k cycles per a VBO vtx-attrib bind :mad: . Only instancing helps…
Yeah, 50-150 cycles (best L2 cache case) extra are nothing in comparison, but I want them for more of my calculations.

Korval · October 8, 2008, 1:47pm

Yes, but I’m not the one asking to sacrifice a bit more performance in releases for the sake of flexibility in a model-viewer

No significant performance is lost. The driver already has to check the compatibility of a attribute indices between a VAO and a program. Rather than comparing 32-bit integers in two arrays, now it will compare strings in two arrays. Strings can be compared as a sequence of 32-bit integers (I do this myself for strings I used as identifiers), which makes string comparing exceedingly fast.

Compared to the gain from actually being able to use VAOs, the performance drop is negligible.

Moreover, in games you don’t just splat any shader as an effect on a character/item.

In your games, you don’t just do that. Other people may have different ideas about what shaders you can use with which models.

I meant to make-up your own standard names and their default binding attribID.

And like I said, 16 attribute indices is not enough. Just off the top of my head, I can conceive of more than 16 attributes that might be used.

And again, it doesn’t deal with the whole, “I don’t know what attributes the shader asked for” problem.

knackered · October 8, 2008, 2:18pm

well it would seem that the inefficiency of what you propose has been considered by microsoft while designing dx10. They have the ID3D10InputLayout object which is used to cache the binding between effectively a VAO and a shader (or at least a particular shader ‘input signature’).
Are you suggesting that the driver should create this ID3D10InputLayout-like object internally and cache it?
If that were a realistic option I’m surprised the dx10 api was cluttered with an unnecessary extra step in drawing.

Korval · October 8, 2008, 2:31pm

Are you suggesting that the driver should create this ID3D10InputLayout-like object internally and cache it?

You’re assuming that this object has a direct hardware analogue. I don’t see any particular evidence of that.

If that were a realistic option I’m surprised the dx10 api was cluttered with an unnecessary extra step in drawing.

D3D does have a tendency to over-complicate things. Like the old vertex stream thing, which was entirely unnecessary.

knackered · October 8, 2008, 2:51pm

actually i can believe that - I can’t see any other use for the inputlayout object in their API, and I can’t for the life of me think of another use for it. So they should have just let the driver cache it.

Komat · October 9, 2008, 3:49am

There is some evidence in the R600 GPU documentation. That GPU fetches vertex data trough special subroutine describing how to sample the various buffers. This subroutine appears as the thing which would be associated with the ID3D10InputLayout object.