Official OpenCL 2.0 Feedback thread

Naming conflict

Hi,

I’m working on a full port of the API to Fortran and specifically the image structure cl_image_format conflicts with CL_IMAGE_FORMAT enum value for quering image info.
The problem is since Fortran is not type sensitive and just like in C, enums are available in global scope.
Actually this is not a problem with OpenCL 2.0, but also related to previous release.

The way I worked around this was to rename the enum values of cl_image_info to CL_IMAGE_INFO_FORMAT (same principal for others as well). Other than this, everything worked well.
An important note to mention, is that there is no good way to change scoping in Fortran that will solve the problem for users, this way or another they will have to use both constructs and a compiler error will raise.

One suggestion is to add the _INFO to all enums used for querying different objects, for conformance.
I do know that it can break previous versions support. But it’s a chance to do something good.
As for the Fortran library, up to this point, it is the only place where full conformance to C OpenCL is broken.

Regards,
Moti.

[QUOTE=pelotoescogorciao;29666]I don’t like the SVM mechanism you proposed in the spec… because:

  1. The existence of clSVMAlloc() sounds like a bad idea. The whole point of shared virtual memory(SVM) should be to REUSE an existing void* pointer externally-allocated using C/C++ malloc(), VirtualAlloc(), etc…
    Some libraries have their own memory manager and they are closed-source. How would I call clSVMAlloc() then?
    [/QUOTE]
    Note that you can use malloc, VirtualAlloc, whatever, IF the platform supports fine-grained system SVM. I don’t know the details but as far as I know it isn’t always possible to support fine-grained system SVM (depending on what sort of cache coherency the system has), which is why there are multiple levels of SVM support in the spec. It would certainly be interested to get some idea of how widely supported system SVMs will be (I assume it is trivial on CPU devices, but APUs, integrated GPUs, discrete GPUs get progressing more interesting).

Surely then you just allocate it CL_MEM_READ_WRITE - that is no different from how buffers are allocated.

The problem is that if you pass the pointer through clSetKernelArg, the driver has no way of knowing whether you’ve passed a cl_mem (buffer) or an SVM pointer - that’s a limitation of C, which has no type-based function overloading.

As soon as you impose a 4K alignment requirement, you’re back to the same issue you raised at the start: it won’t play nice with memory allocated by other libraries that aren’t aware of OpenCL. If you have to do something OpenCL-specific to allocate memory, it might as well be clSVMAlloc, which will know what the actual hardware-specific requirements are.

Hi everyone, I decided to provide detailed feedback in the form of blog articles. You can read them here: https://blog.ajguillon.com/

The first relevant blog is: OpenCL: Standardization Issues | AJ's Blog

Thanks.

Hi, All

I’d be glad to see support for something similar to GL_ARB_draw_indirect in OpenGL world where you can specify buffer where your parameters for glEnqueueNDRange stored.

It is very useful if you know your parameters to glEnqueueNDRange on GPU side and do not want to do round trip.

For now only one workaround I can come up with, is to always add some kind of check and offset like this:

void kernel my_kernel(__global * int myParamsCalculatedInPrevKernelRun, __global int* pout )
{
int count = myParamsCalculatedInPrevKernelRun[0];
int offset = myParamsCalculatedInPrevKernelRun[1];

int gi = get_global_id(0) + offset;
if( gi >= count + offset) return.

// kernel code
*pout = …
}

P.S. I know that partially this is covered by running kernel from kernel but this is not the same.

Thanks

Hi All,

I noticed that the CL_DEVICE_MEM_BASE_ADDR_ALIGN device property isn’t explained in table 4.3. All it says is

The minimum value is the size (in bits) of the largest OpenCL built-in data type supported by the device (long16 in FULL profile, long16 or int16 in EMBEDDED profile) for devices that are not of type CL_DEVICE_TYPE_CUSTOM.

That only describes a constraint on the property, not what it actually is. It appears the 1.0 spec actually had an explanation

Describes the alignment in bits of the
base address of any allocated memory
object.

Although this explanation doesn’t seem great. I started a thread (the forum won’t let me post the link to it, title is " Looking for a better explanation of CL_DEVICE_MEM_BASE_ADDR_ALIGN" ) and based on the discussion I’d like propose the following description for the CL_DEVICE_MEM_BASE_ADDR_ALIGN property:

The bit alignment required for a sub-buffer object’s origin ( see clCreateSubBuffer() ) to be used with this device.
The minimum value is the size (in bits) of the largest OpenCL built-in data type supported by the device (long16 in FULL profile, long16 or int16 in EMBEDDED profile) for devices that are not of type CL_DEVICE_TYPE_CUSTOM.

Please texture compression support at least if not mandatory as a khr extension say cl_khr_compressed_textures and query from what formats read only images are supported etc…
note projects as GitHub - a2flo/oclraster: Flexible Rasterizer in OpenCL i.e. emulating raster pipeline with OpenCL kernels one major feature missing for parity with graphics apis is use of compressed textures
also another Project Large Scale & Interactive Scientific Visualization: OpenGL to Ray-Tracing changes opengl calls to a custom raytracer which uses GPGPU almost surely so basically another example, etc…

I just want to let you know that I have posted another article regarding SPIR, and how it might fit with the standard overall. I have a few more articles to write on OpenCL 2.0 before I am finished giving feedback (issues that I have with the memory and execution models), but I believe that the “big picture” discussions are now complete. The main theme of my articles has really been a separation of hardware and software concerns, which I feel is best expressed by this latest article.

Please provide any feedback you may have on my writing, since this is the first set of detailed technical articles I have written, and I want to ensure that I am getting my message across.

Here is a link to the most recent article: OpenCL 2.0: SPIR Feedback and Vision | AJ's Blog

Also I forgot to say another improvements:
seems now that Opengl comes with sparse textures (well optional ARB extension) GL_ARB_sparse_texture and also DX11.2 (named tiled textures and also optional)
so as both (D3D 11.2 and OGL 4.4) have compute support for sparse textures via their compute shaders seems OpenCL 2.0 should catch up and add an optional cl_khr_sparse_image…
Also now OpenGL has cross vendor bindless tex support via optional ARB_bindless_texture (AMD expects to implement also) and support in compute shaders is supported…
so seems OpenCL should similarly publish a optional cl_khr_bindless_image extension…

Hi all,
It would be nice to add blocking flag to clEnqueueUnmap* set of commands - because clEnqueueMap* can be easily done in (un)blocking way, while one have to deal with events to make sure that unmapping is completed.

The example at the bottom of page 363 in appendix B uses illegal casts:


float4 v = vload4( 0, x );
uint4 y = (uint4) v; // legal, portable
ushort8 z = (ushort8) v; // legal, not portable

These casts are explicitely considered as “not allowed” in section 6.2.2 “Explicit casts” (“Explicit casts between vector types are not legal.”)

The example in Appendix B should use reinterpreting casting, such as:


float4 v = vload4( 0, x );
uint4 y = as_uint4(v); // legal, portable
ushort8 z = as_ushort8(v); // legal, not portable

The same (wrong) example is present starting from OpenCL 1.0 Specification.

It would be good if the section below (page 119) could also include 3D images as well as information if the data is copied when creating the image from the buffer. If it is copied it would be good with support for shared buffer and image data.

"A 2D image can be created from a buffer by specifying a buffer object in the image_desc-
>mem_object passed to clCreateImage for image_desc->image_type =
CL_MEM_OBJECT_IMAGE2D. If image_desc->mem_object is created with
CL_MEM_USE_HOST_PTR, the host_ptr specified to clCreateBuffer must be aligned to the
minimum of the CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT value for all devices in
the context associated with image_desc->mem_object and that support images. "

CL_FILTER_NONE that coud be used by clCreateSamplerWithProperties (when using cl_khr_mipmap_image) is not defined in the headers.

Hi,

I have read the specification several times now. But it is still unclear if the pipes are blocking (reading an empty pipe or writing an full pipe) or if they just return that they are full/empy. And if they are not blocking, how should a reader react it needs to wait for new data? spin-loop?

Ray