Suggestions for improving the OpenCL C++ API?

Dear all,

My group in Bristol is getting involved with improving the C++ API. We’re starting with some bug fixes and plugging a few gaps in the current API. But while we’re at this there’s a chance to add some other improvements too.

So, do you have any feature requests or ideas for improvements that will help make the OpenCL C++ API easier or more compelling to use?

Best wishes,

Simon McIntosh-Smith

Head of Microelectronics Group and University of Bristol Business Fellow
High Performance Computing and Architectures, Department of Computer Science
University of Bristol, UK.

I’ve been using the C++ API for a little while now, and three things that immediately come to mind are the Program and Kernel classes, and error handling.

It would be nice to have some more options for building programs, first of all. First of all, being able to pass a single device object to the build function (instead of having to create a vector each time) would make things cleaner in some cases.
Also, it would be nice to be able to build from source for a particular context (rather than the default) on construction. It would be really nice if the build function took a std::string for the build options, instead of a const char*, since that causes some memory management headaches (when I want to use a temporary, for example).

One slightly more complex idea is that it would be nice if there were some way to conveniently build asynchronously, using the callback function. The current way of needing to provide a c-style callback is a bit awkward in a C++ program. Maybe there’s a way it could be done inside the wrapper with some kind of locking mechanism that blocks on kernel creation until the build finishes?

As concerns the Kernel class, the main thing that I can think of that would be nice is a more convenient way to bind arguments and invoke. I’ve looked at using stream insertion syntax for argument binding to be a bit more convenient. With OpenCL 1.2, it may be possible to use a va_args-type thing (since the kernel parameter types can be extracted from the kernel object). I haven’t been able to use the kernel functor implementation that’s there right now, since it isn’t very well document and appears to require C++11?

With the error handling, it would be very useful to be able to retrieve a string describing the error from cl::Error (as opposed to just having the function that produced the error and the error number).

Another thing that would be really helpful is make the Doxygen documentation more readily available. I’ve found a few places where it’s hosted, but it’s not always up-to-date. Having an official, or semi-official hosted reference would make it considerably easier to use. (It would also be nice to have all the enums documented in the C++ API documentation, so I don’t have to keep switching back and forth between C and C++ docs).

I have some big ones.

The first is HUGE: Make cl::Error populate its what() string with the function call, the error code AND a string describing WHAT happened. When I catch an exception from OpenCL, I want something better than just “Error” to appear automatically in my standard exception catcher.

I second the idea that passing pointers to vectors of things is ODD. I understand why it is sometimes necessary, but there are a lot of little corner cases where it isn’t necessary and makes the API clunky. Why not pass an empty vector by const reference to indicate null? Better yet, use iterators instead of vectors everywhere.

The reimplementation of vector and string is a complex PITA to skip through in the header. I don’t know of ANY platforms that have a C++ implementation that have OpenCL that DON’T have std::string. std::string and std::vector have been around in the standard for close to 20 years now. If you HAVE to have it, have like, clstl.hpp available for download that includes those implementations for you if you want to include them on a platform. There’s no reason to have an entire stl implementation in the damn header.

Lastly, there is a bug where the C++ API assumes that you are linking with the latest version of the CL api. On NVIDIA, clReleaseContext() and other CL 1.2-only constructs aren’t defined in the link library, but because they are used in the C++ cl.hpp header and there’s no way to turn them off, the linker complains. There should be a way to define to the cl.hpp which variant of OpenCL you are linking against.

There is other stuff I’m sure.

More stuff. This is my code for building an openCL program:

catch(const cl::Error& e)
throw std::runtime_error(std::string(“Build Error:”)+program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(devices[0]));

Why do I need this at all? If there is an exception that I have to read the log to find out about, it should just automatically throw a special kind of exception that has the build log inside it, and displays the build log along with what() by default. This entire section of code could just look like,compiledefs);

If threw cl::BuildError: public cl::Error instead.

Also, right before that I have

cl::Program::Sources source(1,std::make_pair(s.c_str(),s.size()));

This is terrible. Why is Program::Sources defined as vector<pair<char*,size_t>>. pair<char*,size_t> is just a terribly inconvenient version of std::string. Program::Sources should be std::vector<std::string> and be done with it. Then I could just say

cl::Program::Sources source(1,s);

There should be an overload of kernel.setArg() that is Kernel::setArg(const string& argname,T arg); That can set arguments by name, so I can do setArg(“inputarray”,array); instead of setArg(3,array); if I wish. This is because as I change my kernels, the order of the arguments might change or I might insert new arguments between arguments, meaning I have to go through and change ALL of my kernel arguments. This is a major pain.

Some of those suggestions should be very simple to implement. Some of them have always been frustrating to me, too, but have fairly sensible reasons somewhere. For example, the program source list I think is that way because there is a wish to not do memory allocation inside the header code. In this case we’d either have to do an alloca (which has been done elsewhere and is dangerous) or create a vector. With recent changes to the header we have been leaning towards creating vectors on the grounds that a user of the header has a well define single class to re-implement if they want to change the way memory management behaves. It seems a reasonable option then to have a version of the source constructor that takes a vector directly and from that constructs a vector of pairs and passes the address of its data into the OpenCL API.

The original reason for including string and vector was not to add them where the installed C++ implementation did not have them, it was to avoid using the STL to satisfy the large number of developers who don’t want it. There was no way the original authors could require STL use in the header. Recently I deprecated the cl.hpp implementations of those classes after discussion with customers because those vendors who want to avoid the STL have their own ways to work around it, and that is a more portable solution than trying to maintain our own version (with all the bugs that led to).

> There should be an overload of kernel.setArg() that is Kernel::setArg(const string& argname,T arg);

With the reflection API in 1.2 that could be possible. I also want to add reflection capabilities to the functor that would type check it on construction if that is practical, so equivalently a kernel object could construct a map of argument names. It’s yet another data structure in the kernel object, though, in an API that is supposed to be very thin. Maybe the right way to go about this is to have an extended header that builds on the primary one and contains classes that encapsulate that kind of information. The goal of cl.hpp was to be almost overhead-free, as soon as we start having to scan through reflection and construct maps we are carrying a lot more data and slowing construction of the kernel objects in a way that would be unsatisfactory to a lot of users.

Finally, this:
> Lastly, there is a bug where the C++ API assumes that you are linking with the latest version of the CL api.

I don’t quite understand. There was a runtime bug relating to this, because the ICD design has no safe recovery from calling a function that doesn’t exist in the underlying implementation, for which we added a workaround to cl.hpp to do a version check. At link time, though, it should work fine. clRetainDevice is wrapped with:

#if defined(CL_VERSION_1_2)

So it should only build that in if the cl.h on the platform defines that flag and claims to be a 1.2 implementation. Are you sure you don’t mean the issue that arose at runtime? If you do, the latest couple of versions of the header should have fixed that.

You dont’ have to do either. Accept a const reference to a vector of strings as an argument. Make the user create the vector. Better yet, just use a templated iterator to a string, let the user do whatever kind of memory management they want. Either way, a vector of pairs is just as much dynamic memory as a vector of strings would be, except the vector of strings is actually a standard way of doing that.

Then use iterators in the C++ API everywhere (templated iterator arguments) so that its irrelevant, or just specify the interface as STL and be done with it. If the developers think that the STL as an interface is bad, then they can use the C API…using the STL interface but with a different implementation doesn’t help that problem. If they like the interface, but not the implementation on their platform, then there are many alternate implementations of the STL publicly available. If they have neither objection, then there’s no problem. Either way, there’s no reason for the OpenCL API to provide an STL implementation.

Yep, it would be really nice.

Thats probably not necessary.

[[QUOTE=LeeHowes;29468]The goal of cl.hpp was to be almost overhead-free, as soon as we start having to scan through reflection and construct maps we are carrying a lot more data and slowing construction of the kernel objects in a way that would be unsatisfactory to a lot of users.[/QUOTE]

Construct the map lazily in the new setArg() function. Then there won’t be any overhead incurred in the kernel for this feature use unless you use it. Something like

#ifdef CL_VERSION_1_2

    typedef map<string,int> __ARGUMENT_MAP;

    template <typename T>
    cl_int setArg(const string& name, T value)
		    //allocates this->_arg_map and iterates through using kernel reflection to build it
		__ARGUMENT_MAP::iterator arglocation=this->_arg_map->find(name);
		if(arglocation == this->_arg_map.end())
			throw cl::Error(string("Unable to find argument to kernel with name...")+name);

        return setArg(arglocation->second,value);


The only overhead is an empty pointer that never gets touched…increasing the size of the kernel object on the stack by sizeof(void*)…but if you don’t use the feature nothing else happens.

So, unfortunately, many systems ship with the reference cl.h installed even if the underlying drivers don’t actually support it. I believe Ubuntu and Linux Mint both do this with nvidia-ocl-dev. On one of my systems the driver was only capable of supporting 1.1, but was compiling with 1.2 reference C header from khronos. This is fine, if I don’t link with/use 1.2 features in C. But in C++, the fact that I had the reference header meant that it linked with clRetainDevice even though I didn’t want it to, which caused a link error.

It’s probably more accurate to say that the error here is caused by the package maintainers, because you should only get cl.h from your vendor…fine. But regardless of whose fault it is, if the C++ API is going to assume based on the headers which API my driver supports, when the C API does not, it might be nice to be able to use a flag or something to control which functions the C++ api links with, regardless of what C header is installed on the system.

Maybe change that wrapper to like

#if defined(CL_VERSION_1_2) && defined(USE_CL_VERSION > 0x1200)

or something.

I don’t know, now that I’m reading it, thats pretty terrible and breaks code…so no, don’t do that… I am just saying what ended up happening.

That much I understood :slight_smile: My point was about memory management INSIDE the method. I’ve just looked again though and of course the CL API doesn’t take it as anything resembling pairs anyway so doing it the cleaner way as you suggest is no more memory management than the current way it is done. So you are quite right, I think, we should add that version.

Either way, there’s no reason for the OpenCL API to provide an STL implementation.

And indeed it does not. That’s why it is wrapped in big deprecation flags:

Unfortunately the rules of deprecation mean we can’t really simply delete that code. Given some time hopefully we can clean it up.

So, unfortunately, many systems ship with the reference cl.h installed even if the underlying drivers don’t actually support it.

That is interesting. It is surely a bug, though, because using that flag is one way that you as a developer should be able to tell what level of functionality your SDK claims support for, no? One option might be to add a “CL_FORCE_OPENCL_1_1” flag that, after cl.h is included, does:
#if defined(CL_FORCE_OPENCL_1_1)
#undef CL_VERSION_1_2

and of course recover the flag at the end of the header… we should consider this if it’s a serious problem in the wild, but it certainly feels like a workaround rather than a feature.

Please, add this important method:

static bool cl::available();

basically, you call LoadLibrary(“OpenCL”) or its multiplatform equivalent and see if fails or not…
This is used to avoid the typical “Windows can’t load XXXXX.dll” message in case there is no OpenCL.dll/driver present in the system.

A few other things I’ve noticed:

cl::NDRange and cl::size_t are used inconsistently, which creates problems. First of all, cl::size_t is a really bad name for a class, since it masks a global typedef in the cl namespace, and its name implies that it is a typedef. Secondly, kernel.getWorkGroupInfo<CL_KERNEL_COMPILE_WORK_GROUP_SIZE>() and other functions return a cl::size_t, which is pretty useless for setting work-group sizes on kernel enqueue. It would be much better if everything just used NDRange, (also, easier element access to NDRange and arithmetic would be nice).

Secondly, getting information from CL objects is pretty clumsy. program.getBuildInfo<CL_PROGRAM_DEVICES>() is a lot to type and read for a fairly simple operation, and I’ve actually started just holding duplicate references to things like cl::Device so I don’t have to put this everywhere. At least, it would be better if the constants were duplicated in the local scope, so I could do something like program.getBuildInfo<Devices>(). Even better, for commonly-used attributes, it would be nice just to have a direct function to query, like program.getDevices().

I also noticed that in a lot of places, input arguments are copied into alloca() memory regions. This seems to add extra overhead, as well as using a non-standard function. What’s the reasoning behind this usage?