Is there anyway I can tell glVertexAttribPointer use interleaved memory layout like XXXXYYYYXXXXYYYY?

  • I want to organization my position like SOA to work more effecient with memory so it’s looked like this:
 struct Entity {
    bool *canMove;
    Vec2 *pos;
    float *velocity;
    ...
 };
 union Vec2 {
      float *someWhere;
      struct {
        float *x;
        float *y;
    }
 };
  • Where someWhere point to memory like this: XXXXYYYYXXXXYYYY
  • Is There anyway I can tell OpenGL organize memory like mine.
  • I known I can convert between XXXXYYYYXXXXYYYY to XYXYXYXYXYXYXYXY or to XXXXXXXXYYYYYYYY to work with OpenGL.
  • But it will become nightmare when game become more complex and require more memory. Eventually it will not have any efficient anymore.

The whole point of structs of arrays is to be more cache friendly, to keep information that you use locally coherent. If you’re manipulating the X component of a position, you are almost certainly also manipulating the Y component. So having them be next to each other improves cache coherency.

So why are you trying to make cached access patterns worse by spreading them out? Don’t adopt structs-of-array usage mindlessly, like it’s some kind of salve you spread over any code to make it faster. You have to think carefully about which things should be structs and which should be arrays. And when it comes to things like a vector position, these are things that should pretty much always be structs.

Also, cache coherency matters for the GPU too. Spreading out components instead of interleaving them might make CPU access to specific components faster, but it makes GPU rendering from them slower. Which matters more is up to you and your needs, but personally, I’d focus on what the GPU needs. After all, when it comes to mesh data, you generally aren’t poking and prodding at it all that often from the CPU.

  • Yes, that’s why I choose AOSOA instead of raw SOA. With that way I can do SIMD and cache coherency also. Look at what I did.
    X[0][0] X[0][1] X[0][2] X[0][3] Y[0][0] Y[0][1] Y[0][2] Y[0][3] X[1][0] X[1][1] X[1][2] X[1][3] Y[1][0] Y[1][1] Y[1][2] Y[1][3]
  • a cahce line is almost always 64 bytes, and I layout my position memory alignment as 64 bytes. If I has X in cache line, so I have Y in the same cache line also.

If you want to store the X and Y components in separate arrays, you’ll need to make them separate attributes and combine them in the vertex shader. E.g.

layout(location=0) in float pos_x;
layout(location=1) in float pos_y;

void main()
{
    vec2 pos = vec2(pos_x, pos_y);
...
}

As Alfonse says, SOA is likely to be worse than AOS. Especially if the GPU stores each attribute as a vec4 internally, as you’ll get fewer vertices in the cache.

OK, but don’t forget: we’re talking about GPU memory, not CPU memory. Reading from GPU memory is not a process known for its speed and doing read/modify/write operations to GPU memory is even worse.

Order your vertex data however you feel is best for the CPU, but when you do that final write to go to GPU memory, it should be ordered for efficient GPU consumption.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.