Is it safe to cast a pointer-to-vector to pointer-to-scalar?

PhilTillet · December 24, 2012, 6:25pm

Hello!

Let’s say x, y and z are three cl_mem(sizeof(float)*9) … I was wondering if these kind of kernels were safe/portable. I couldn’t find information on the standard whether this was legal or not.

__kernel void(__global float4* x, __global float4* y, __global float4* z){

z[0] = x[0] + y[0]
z[1] = x[1] + y[1]

__global float* new_x = (float*)x;
__global float* new_y = (float*)y;
__global float* new_z = (float*)z;
new_z[8] = new_x[8] + new_y[8];

}

In my case, filling x,y,z with 0 to match the alignment is not really an option , and vloadn/vstoren seem to induce a lot of overhead… and vectorization is a benefit I don’t want to lose if x,y,z are of size 4 000 001

Thank you!

Edit :
According to the standard :
Casting a pointer to a
new type represents an unchecked assertion that the address is correctly aligned. The developer
will also need to know the endianness of the OpenCL device and the endianness of the data

What does endianness mean when it comes to vector types? Does it apply to the whole vector or each individual element?

PhilTillet · December 27, 2012, 5:03pm

I have done some further tests. In case someone wonders, on my NVidia OpenCL 1.1 SDK,
((float)x) = 0
Results in CL_OUT_OF_RESOURCES, so I guess such a cast has undefined behavior.

In case some people wonder what I am doing to solve the issue after a bit more thinking:
What I am doing now is allocating for 16 (the highest alignment) possible, while the user will still see a size of 9, the 8 remaining ones are just fillers… this way I can safely use the vector as float4* on one device, and float8* on another device, for example

Cheers