Mapping of Vertex Formats to DMA Streams

Christian_SchA_ler · June 4, 2004, 2:47am

Hi,
recently, I became curious about the way HW maps Vertex Formats and Vertex Streams. In DirectX, you usually have few streams of some declared vertex format. In OpenGL however there are no vertex formats, it’s all glPointer().

This then raised the question, which scheme does more justice to how HW actually works. Are there some 8 or 16 DMA channels available to match the OpenGL model?

imported_jwatte · June 4, 2004, 8:01am

It depends on the hardware.

And, for most hardware, I wouldn’t view it as a “DMA channel” as much as “cache lines” or something similar in the hardware. I e, it’s possible that data stride and aliasing matters as much as how scattered (or not) it is.

Also, the driver is likely to be able to combine interleaved arrays into a single memory fetch. If you have a struct that contains your vertex, and your vertex array is an array of this struct, then all the Pointer() calls collapse down to a single “stream” anyway.

Christian_SchA_ler · June 5, 2004, 4:17am

Originally posted by jwatte:
[b]It depends on the hardware.

And, for most hardware, I wouldn’t view it as a “DMA channel” as much as “cache lines” or something similar in the hardware. I e, it’s possible that data stride and aliasing matters as much as how scattered (or not) it is.

Also, the driver is likely to be able to combine interleaved arrays into a single memory fetch. If you have a struct that contains your vertex, and your vertex array is an array of this struct, then all the Pointer() calls collapse down to a single “stream” anyway.[/b]
Thanks.

It seems I need to go and make some experiments again. Back in GF2 times, I found no diff in performance when putting a 32 bytes vertex format (pos/nrm/tex) interleaved or not. I then went for non-interleaved (structure of arrays, basically each mesh component has it’s own VBO) as it is more flexible.

Would love to do this in DX with the same ease

imported_jwatte · June 5, 2004, 11:11am

IDirect3DDevice9::SetStreamSource() not good enough for you?

Christian_SchA_ler · June 6, 2004, 11:26pm

Of course it’s doable, but you must check the caps to see how many streams the driver supports and that all your data must be in vertex buffers.

Then again, a GL equivalent of SetStreamSourceFreq() would be nice.

imported_jwatte · June 7, 2004, 8:09am

Then again, a GL equivalent of SetStreamSourceFreq() would be nice.
Or, even better, the rumored DX 9.1 update which allows you to specify repeat across the vertex buffer (i e, only specify the shared vertices once). Yum!

Korval · June 7, 2004, 10:42am

Or, even better, the rumored DX 9.1 update which allows you to specify repeat across the vertex buffer (i e, only specify the shared vertices once). Yum!
I’m not entirely sure what this means. Could you clarify it?

Christian_SchA_ler · June 8, 2004, 12:41am

It sounds like place vertices for multiple instances once into a buffer and set repeat to the size of the buffer, while the transform data comes from another steam with reduced frequency. It would make a lot of sense.

Obli · June 8, 2004, 2:57am

Originally posted by Christian Schüler:
It sounds like place vertices for multiple instances once into a buffer and set repeat to the size of the buffer, while the transform data comes from another steam with reduced frequency. It would make a lot of sense.
I could see some cases in which it would help but I’m not sure I’ve understood this correctly.
Does this means that say, the geometry array can be 400 elements wide while texcord[0] array can be 100 elements wide and repeated 4 times (something like a wrap-around)?

imported_Adruab · June 8, 2004, 8:34am

Yeah that’s the general idea. DX is targetting it more at instancing. I.E. putting the geometry for a highly replicated object in one buffer and then putting the transform/other per obj data in another stream and have it draw the per obj data only once per every full cycle through the geometry data. Effective this removes the overhead for all the state changes required for rendering individual objects. There are other applications as you mention, though they aren’t as widely applicable (not sure why you’d want to replicate 100 tx coords ).

I think what the guy mentioning repeat is talking about is just that, being able to tell the driver that you’re duplicating the information in a table. Most of the time when building an indexed mesh you have to duplicate position information if the corresponding vertex on a different poly has a different texture coordinate/normal etc (moving from 3DS to interleaved arrays, also talked about in problems with texture seaming).

It could lead to a reduction in memory if you only had to specify duplication indices for those vertices after the table of unique ones. Still, I don’t think you’d gain much if any performance due to restrictions on how to organize vertex data (not nice for direct streaming…), unless we had much bigger vertex caches (reuseable stream output…) and we could specify the calculations on the different streams as seperable (utilize already transformed positions with seperately calculated normals and such). Ooooh, if that was possible, I’d buy into this any day .