The whole point of structs of arrays is to be more cache friendly, to keep information that you use locally coherent. If you’re manipulating the X component of a position, you are almost certainly also manipulating the Y component. So having them be next to each other improves cache coherency.
So why are you trying to make cached access patterns worse by spreading them out? Don’t adopt structs-of-array usage mindlessly, like it’s some kind of salve you spread over any code to make it faster. You have to think carefully about which things should be structs and which should be arrays. And when it comes to things like a vector position, these are things that should pretty much always be structs.
Also, cache coherency matters for the GPU too. Spreading out components instead of interleaving them might make CPU access to specific components faster, but it makes GPU rendering from them slower. Which matters more is up to you and your needs, but personally, I’d focus on what the GPU needs. After all, when it comes to mesh data, you generally aren’t poking and prodding at it all that often from the CPU.
Yes, that’s why I choose AOSOA instead of raw SOA. With that way I can do SIMD and cache coherency also. Look at what I did.
X[0][0] X[0][1] X[0][2] X[0][3] Y[0][0] Y[0][1] Y[0][2] Y[0][3] X[1][0] X[1][1] X[1][2] X[1][3] Y[1][0] Y[1][1] Y[1][2] Y[1][3]
a cahce line is almost always 64 bytes, and I layout my position memory alignment as 64 bytes. If I has X in cache line, so I have Y in the same cache line also.
If you want to store the X and Y components in separate arrays, you’ll need to make them separate attributes and combine them in the vertex shader. E.g.
layout(location=0) in float pos_x;
layout(location=1) in float pos_y;
void main()
{
vec2 pos = vec2(pos_x, pos_y);
...
}
As Alfonse says, SOA is likely to be worse than AOS. Especially if the GPU stores each attribute as a vec4 internally, as you’ll get fewer vertices in the cache.
OK, but don’t forget: we’re talking about GPU memory, not CPU memory. Reading from GPU memory is not a process known for its speed and doing read/modify/write operations to GPU memory is even worse.
Order your vertex data however you feel is best for the CPU, but when you do that final write to go to GPU memory, it should be ordered for efficient GPU consumption.