Line of code that fails with Intel driver, but works with AMD and NVIDIA.

Hello,

following line of code (OpenCL C++ bindings 1.2) fails with Intel driver (segmentation fault), but works with AMD and NVIDIA.

Is this a bug in the Intel driver or the code is not quite correct:

cl_context_properties cps = getContext(queue).getInfo<CL_CONTEXT_PROPERTIES>()[1];

Thank you!

I forgot to mention that I get the error on Linux.

Could somebody, please, run following function to confirm this error?

string getPlatformVendor(const CommandQueue & queue)
{
	cl_context_properties cps = getContext(queue).getInfo&lt;CL_CONTEXT_PROPERTIES&gt;()[1];
	return  (cl::Platform((cl_platform_id)cps)).getInfo&lt;CL_PLATFORM_VENDOR&gt;();
}

Thank you

      • Updated - - -

I forgot to mention that I get the error on Linux.

Could somebody, please, run following function to confirm this error?

string getPlatformVendor(const CommandQueue & queue)
{
	cl_context_properties cps = getContext(queue).getInfo&lt;CL_CONTEXT_PROPERTIES&gt;()[1];
	return  (cl::Platform((cl_platform_id)cps)).getInfo&lt;CL_PLATFORM_VENDOR&gt;();
}

Thank you

How was the context created (i.e. which constructor)? Did you manually pass any context properties in?

What does the getContext() function look like?

Jprice, thank you very much for responding!

The context is created in this constructor:

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L50

Context properties are passed here:

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L74
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L81

Here is the getContext():

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L174

Everything fails on this line:
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L139

once called here with the queues[i] being an Intel device (or POCL)
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L105

ASL seems to be a very nice tool, just I can’t get it working on some of my computers.

Thank you!

      • Updated - - -

Jprice, thank you very much for responding!

The context is created in this constructor:

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L50

Context properties are passed here:

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L74
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L81

Here is the getContext():

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L174

Everything fails on this line:
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L139

once called here with the queues[i] being an Intel device (or POCL)
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L105

ASL seems to be a very nice tool, just I can’t get it working on some of my computers.

Thank you!

I’ve just tried the getPlatformVendor function from the ASL package on a few different platforms (including Intel and pocl), and it is working fine for me I’m afraid.

First of all - thank you! Which versions of Intel and POCL drivers did you use? POCL v0.10 didn’t work for me…

I used pocl built from version control, so v0.12-pre. I’ve just tried v0.10, and I got a blank value from getPlatformVendor. It didn’t crash, but valgrind showed some dodgy stuff happening inside that function, which may be the same issue that you were experiencing.

The Intel runtime was version 14.2.

Thank you for the extra details!

Intel runtime version 14.2 is actually problematic - could you, please, check it and the POCL v0.12-pre also with valgrind - I assume they will behave the same (dodgy stuff inside, but luckily no crash on your machine).
So the question now is - whose fault is this and where report the bug? - Is it faulty implementation of the OpenCL specs by Intel/POCL or should I file a bug with ASL? With other words: can getInfo<CL_CONTEXT_PROPERTIES>() return NULL according to the 1.2 specs or not? And what is the behavior of this function in OpenCL2.0 and 2.1 specs?

Thanks.

      • Updated - - -

Thank you for the extra details!

Intel runtime version 14.2 is actually problematic - could you, please, check it and the POCL v0.12-pre also with valgrind - I assume they will behave the same (dodgy stuff inside, but luckily no crash on your machine).
So the question now is - whose fault is this and where report the bug? - Is it faulty implementation of the OpenCL specs by Intel/POCL or should I file a bug with ASL? With other words: can getInfo<CL_CONTEXT_PROPERTIES>() return NULL according to the 1.2 specs or not? And what is the behavior of this function in OpenCL2.0 and 2.1 specs?

Thanks.

pocl v0.12-pre is working fine, and Valgrind doesn’t indicate any issues (with this function).

Valgrind also doesn’t indicate any issues with this function for the Intel 14.2 runtime either.

The CL_CONTEXT_PROPERTIES query can return a zero-sized list only if a NULL properties argument was passed to the clCreateContext[FromType] function, otherwise it has to return the same properties list that was passed in.

Thank you again!
So the main question remains: whom to blame?

Seemingly non-NULL context properties are passed here:

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L74
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L81

Can you, please, check what values valgrind shows on input for:

cps[0] = CL_CONTEXT_PLATFORM;
cps[1] = (cl_context_properties)(platforms[i])();
cps[2] = 0;

If all this is non-NULL then POCL is to blame, right?

On the other hand is this correct:
getInfo<CL_CONTEXT_PROPERTIES>()[1];

or should it maybe be:
getInfo<CL_CONTEXT_PROPERTIES>()[0];
on this line:
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L139

      • Updated - - -

Thank you again!
So the main question remains: whom to blame?

Seemingly non-NULL context properties are passed here:

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L74
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L81

Can you, please, check what values valgrind shows on input for:

cps[0] = CL_CONTEXT_PLATFORM;
cps[1] = (cl_context_properties)(platforms[i])();
cps[2] = 0;

If all this is non-NULL then POCL is to blame, right?

On the other hand is this correct:
getInfo<CL_CONTEXT_PROPERTIES>()[1];

or should it maybe be:
getInfo<CL_CONTEXT_PROPERTIES>()[0];
on this line:
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17ca719570ba6e5/src/acl/aclHardware.cxx#L139

I’m not sure what you’re asking now. This was a bug in pocl v0.10 (which is the version that you said you were using), but it has now been fixed, and pocl v0.12 is handling this fine. Have you tried a newer version?

If you still have this issue with pocl v0.12 (which should be released fairly soon), then you should raise a bug against pocl, ideally with a self-contained snippet of code that reproduces the crash.

With regards to the Intel runtime - which OS do you run it on? On ArchLinux version 14.2 has issues… Are you sure you are not using 15.1?

Thank you.

      • Updated - - -

With regards to the Intel runtime - which OS do you run it on? On ArchLinux version 14.2 has issues… Are you sure you are not using 15.1?

Thank you.

This is running on RHEL 6.3. I’m sure this is 14.2 - this machine has a Xeon Phi in it, and 14.2 is the latest release that supports it. The specific driver version (from CL_DRIVER_VERSION) is 1.2.0.82248, if that helps.

Could you, please, tell me where you got your version of Intel runtime 14.2 with the driver version 1.2.0.82248? The Arch package seem to have 1.2.0.8

https://aur.archlinux.org/packages/intel-opencl-runtime/

it takes it from here:

http://registrationcenter.intel.com/irc_nas/4181/opencl_runtime_14.2_x64_4.5.0.8.tgz

Thank you!

      • Updated - - -

Could you, please, tell me where you got your version of Intel runtime 14.2 with the driver version 1.2.0.82248? The Arch package seem to have 1.2.0.8

https://aur.archlinux.org/packages/intel-opencl-runtime/

it takes it from here:

http://registrationcenter.intel.com/irc_nas/4181/opencl_runtime_14.2_x64_4.5.0.8.tgz

Thank you!

We would have downloaded it directly from the Intel website. This would have been about 18 months ago, so there may have been revisions to that driver version since then.

If you can recreate the bug with a minimal program (i.e. just a few lines of code) with the latest version of the Intel runtime, then you should submit a bug report to Intel.