setting up data in VBO

i guess this is just a basic question …
when setting up vertex data in VBO, is it better to set up like:

struct Point3D {
float x;
float y;
float z;
float w; // for memeory padding
}

or it doesn’t really matter whether you get rid of the last ‘w’ parameter?

if it’s better to have the above memory padding trick, how much performace boost can i expect?

thanks

I suppose it depends on the architecture, but for NVIDIA hardware, it’s always better to tightly pack floats.

You generally want to make an attribute 4-byte aligned, but that’s pretty much it.

So for 16-bit/component normals, you would want a 16-bit pad. For RGB ubyte colors you would want a 8-bit pad.

Thanks -=
Cass

As Cass said, it is completely architecture dependent. The only requirement of the spec is that data be aligned to its size. That is, 4-byte elements must be aligned on 4-byte boundries, 8-byte elements must be aligned on 8-byte boundries, etc. So, assuming the compiler doesn’t automatically add padding, the following would not be valid:

struct point3d {
    float nx;  // normal
    float ny;
    float nz;
    double x;  // vertex
    double y;
    double z;
};

Fortunately, the compiler will almost always (this is CPU architecture dependent, though) add in padding between nz and x. YMMV.

[This message has been edited by idr (edited 12-04-2003).]

Originally posted by cass:
[b]
I suppose it depends on the architecture, but for NVIDIA hardware, it’s always better to tightly pack floats.

You generally want to make an attribute 4-byte aligned, but that’s pretty much it.

So for 16-bit/component normals, you would want a 16-bit pad. For RGB ubyte colors you would want a 8-bit pad.

Thanks -=
Cass[/b]

How about interleaved or kvazi-interleaved (strided) arrays? What are the hardware limitations on native (fast-path) ‘vertex pulling’ when using such memory layouts on NVIDIA hardware? Are there any benefits compared to non-interleaved arrays?

In general, you will get better performance with interleaved arrays. Most HT&L hardware is capable of pulling at least 2 separate streams, but according to DirectX device caps, using more than 2 streams is not generally supported.

Aligning to 16 bytes (or multiples of thereof) makes sense if you process your vertices on the CPU, as you can use SSE instructions, and non-cache-polluting loads/stores. Not to mention it’ll make the most efficient use of cache lines (assuming you also align your buffers starts to the start of a cache line).

Cards that pull an entire cache line worth of vertex data whenever they need a single vertex will also work better when things are nicely rounded to powers of two, within reason.

One thing you can do is find ways to pack extra data into that padding space. For example, a color, or two texture coordinates stored as shorts, fit in those 4 bytes. As would a normal stored as chars, assuming your hardware can efficiently unpack that (I seem to recall that GFFX can, but I think GF 2MX cannot – Cass will correct me if I’m wrong I’m sure :slight_smile:

As far as direct hw support for interleave/non-interleaved, we support 16 fully independent arrays.

The standard rules for good cache locality apply. If you have interleaved arrays, but you’re not using all attributes, that’s bad. If you have separate arrays but your index jumps around a lot, that’s bad.

The equation for good cache locality is just what you would expect it to be, I think.

Interleave stuff that’s always used together, and always try to keep good coherence in your indices.

Thanks -
Cass