what is the correct behavior in the cases of sub-devices created after context creation?
Let’s say that I create a context C that only includes a single device, devA. I then partition the device, creating sub-devices sub0 and sub1.
Can I create a command queue for sub0 and sub1 in the context C and launch kernels built for devA on these queues?
Can I create the command queue for the sub-devices, but then have to build the programs specifically for the sub-devices?
This is not clear from the specification, and AMD and Intel platforms behave differently on CPU: AMD allows command-queue creation
for the subdevices in the parent device context, and building programs for them, Intel doesn’t (and actually segfaults if explicitly building for all the devices).
Now, when it comes to the specification, the glossary (both in 1.2 and 2.0) says:
Sub-device: An OpenCL device can be partitioned into multiple sub-devices. The new sub-
devices alias specific collections of compute units within the parent device, according to a
partition scheme. The sub-devices may be used in any situation that their parent device may be
used. Partitioning a device does not destroy the parent device, which may continue to be used
along side and intermingled with its child sub-devices. Also see device, parent device and root
(emphasis mine). This would seem to suggest that you don’t need to explicitly create a context for sub-devices, since everything should just work
OTOH, in the “Partitioning a Device” section, it says:
The output sub-devices may be used in every way that the root (or parent) device can be used,
including creating contexts, building programs, further calls to clCreateSubDevices and creating command-queues. When a
command-queue is created against a sub-device, the commands enqueued on the queue are
executed only on the sub-device.
which hints at the fact that sub-devices should be treated independently from their parents.
Then, for clCreateCommandQueue, we have:
device must be a device associated with context. It can either be in the list of devices specified
when context is created using clCreateContext or have the same device type as device type
specified when context is created using clCreateContextFromType.
This would imply that e.g. a sub-device of a device of type CPU, which would be itself of type CPU, would be valid for clCreateCommandQueue if the context was created with clCreateContextFromType, but would not be valid if the context was created simply with clCreateContext, except that this contradicts the glossary.
Of course the situation is even funnier when using clGetContextInfo. When returning the list of devices or the number of devices, should sub-devices be included or not? Does this depend on how the context was created?
I think these ambiguities should be clarified. An official response would be much appreciated. It would also seem that the conformance suite doesn’t test these cases, since AMD and Intel have different behavior.