Alignment problem?

Hi guys,

I am fairly new to opencl and padding structures in C yet I think I get the idea.

But here is the problem:

typedef struct {
cl_float3 a;
cl_float b;
} foo;

sizeof (foo) gives 32, which is okay cause from what I understand cl_float3 is 16 bytes and structure has padding at the end to be a multiple in size of the largest element (which is 16 in that case).

But on the device (which is CPU)

typedef struct {
float3 a;
float b;
} foo;

sizeof (foo) gives 16, switching the order to float first and float3 seccond gives size of 32, which is resonable, assuming float3 alignment is 16 bytes.
But why it was 16 in the first place? The only thing that comes to my mind is that float3 has size 12 but it’s aligned to 16.

Weren’t cl_ types made for that purpose? to maintain size of struct that is going to be passed to kernel?

Aye,

sizeof (cl_float3) on the host gives me 16
sizeof (float3) on the cpu device is 12

Well according to the 1.1 spec:

6.1.5
For 3-component vector data types, the size of the data type is 4 * sizeof(component). This means that a 3-component vector data type will be aligned to a 4 * sizeof(component) boundary.

Which means that your device implementation seems to be in error. I would file a bug report with your vendor.

I’m on Mac OS X, just copied 1.1 headers from khronos over original apple ones, hmm…

Interesting – I didn’t think Apple had released a 1.1 implementation yet. I don’t think stomping their headers is a fair way to “upgrade” to 1.1. :wink: I’m surprised that their CL C compiler accepted float3 at all.

Oh well, that explains the confusion :).

I got tempted by:
“All of the following headers should be present in a directory CL/ (or OpenCL/ on MacOS X).”
from 1.1 section on khronos site.

Was shyly surprised it worked that well though.

Sticking with float4 will be my best bet, I guess…

Regards,

btw. despite that alignment error, float3 worked flawlessly inside the kernel.

btw. despite that alignment error, float3 worked flawlessly inside the kernel

Pre-release features often show up in compilers well in advance, but often with bugs, gotchas or behavioural differences in places where the spec was in flux until late in the process. This sort of thing should be caught by the official conformance tests, and thus a certified 1.1 implementation cannot ship with such a delta from the spec.

Swapping in the headers worked (or seemed to) because the change from 1.0 to 1.1 was relatively small from an API point of view. If you try to use 1.1 specific things though it will no doubt make your life “interesting”. We just have to wait until the 1.1 headers come as part of the OpenCL framework from Apple.