V-man wrote:
The vertex format can be xyz, but your entire vertex should be multiples of 32 bytes.
Perhaps this is just a slight misunderstanding, but that’s not what the document says.
It says that if you’re shuffling data over AGP, use “multiples of 32 byte sized vertices”.
The way I read it, is that if you’re on AGP (and now we’re venturing way outside OpenGL and into specific hardware optimizations for bus-transactions - for a bus that’s being phased out) you should submit your data in a form allowing to maximize the throughput of that particular bus. I.e. if you have only xyz in your vertices, it means you should submit them in batches of multiples of 8 vertices (8*12 = 96 bytes = 3 bus transactions, 32 bytes each).
If supporting older (e.g Radeon 7000, TNT2) or less specialized (such as Intel integrated 915, 925 or similar) h/w, that performs TnL using the CPU, submitting data well aligned for the CPU will affect that/those stage(s) of the pipeline, but besides CPU and/or system RAM specific behaviours, I think todays and yesterdays GPU’s (back to Radeon 9200 and GeForce… 1?) handles just about any 32-bit aligned data the same (someone with insight here; feel free to chime in if this assumption is wrong).
What can matter is the alignment of the starting address (in system memory) of a submitted batch of data. Using immediate calls (glVertex & co) the driver should handle this. Mapped buffers should already be page aligned (and on Windows, due to the way its memory manager works, I’m almost 100% sure you’ll even get them 64KB-aligned).
I think that leaves only the “upload” style functions (e.g. BufferData), where source data could be mis-aligned from a cache-line, bus transfer, or even DMA perspective.
By that, I think I’ve left “off-topic” in the dust for this thread, so I’ll stop here. Just to round off, I’m not saying alignment isn’t still an issue, but something tells me it often isn’t the AGP memory transaction requirements that is the issue, anymore.