Kernel with >8 float3 parameters fail on Intel device

Hi all,

I have recently started coding OpenCL and have a project (medical physics) with a few tens of kernels that I run using the C API through a custom C++ wrapper (resource management, exceptions, present only minimal needed interface).

I had problems with one of the kernels and after stripping it I isolated the issue to depend only on the number of float3 parameters. It seems like kernels with >8 float3 parameters breaks when setting parameters on an Intel device (works fine with NVidia). clSetKernelArg returns CL_INVALID_ARG_SIZE for the 9th parameter. Reducing the number of parameters to 8 or fewer fixes the problem, as does changing the stripped kernel (and host code) to use float or float4 instead of float3.

Stripped kernel verified to cause the problem:

kernel void tst( float3 x1, float3 x2, float3 x3,
float3 x4, float3 x5, float3 x6,
float3 x7, float3 x8, float3 x9 ) {}

OS: Win7 x64

Intel device:
name: Intel® Xeon® CPU X5550 @ 2.67GHz
driverVersion: 1.1
version: OpenCL 1.1 (Build 13785.5219)

NVidia device:
name: GeForce GTX 560 Ti
driverVersion: 280.26
version: OpenCL 1.1 CUDA

I would be very grateful for any insight you can offer. Thanks!

There are some other issues with using vector parameters of length 3.

In my tests of all vector parameters from all scalar types of sizes 2, 3, 4, 8, and 16, I observed the following behaviors:

Intel OpenCL (Xeon 5570 CPU):
long3, ulong3, double3 – Intel OpenCL expects lengths of 3 * 64-bit entities. All other platforms expect lengths of 4 * 64-bit entities. The values passed to the kernel are not correct.

AMD OpenCL (Cypress):
int3, uint3, float3 – The values passed to the kernel are not correct.

AMD OpenCL (Xeon 5570 CPU): No issues found

NVIDIA OpenCL (Tesla C1060, C2050, C2070, C2075, M2090): No issues found

In general, I did not find any issues with any of the 2, 4, 8, and 16-length vectors of any type in any platform or on any device.

Call clGetDeviceInfo() with CL_DEVICE_MAX_CONSTANT_ARGS as the second argument. It will tell you how many constant arguments your device can support (the min in the spec is 8!)