Problem with the OpenCL shared libraries?

zadi0648 · December 8, 2016, 3:39pm

So I just successfully installed the driver for the Quadro M5000, and everything works. Here is the driver I installed: Linux x64 (AMD64/EM64T) Display Driver | 375.20 | Linux 64-bit | NVIDIA. The machine is running CentOS 6. However when I try to run some OpenCL code I wrote I get the following error:

[zack@sockeye ~]$ OpenCLmovestack 
Exception in thread "main" java.lang.UnsatisfiedLinkError: /home/zack/my_prowess_home/sys/linux64/lib/libcom_nanoseis_ssa.so: /usr/lib64/libOpenCL.so.1: version `OPENCL_2.0' not found (required by /home/zack/my_prowess_home/sys/linux64/lib/libcom_nanoseis_ssa.so)
    at java.lang.ClassLoader$NativeLibrary.load(Native Method)
    at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1803)
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1728)
    at java.lang.Runtime.loadLibrary0(Runtime.java:823)
    at java.lang.System.loadLibrary(System.java:1028)
    at com.nanoseis.ssa.ParallelMoveoutAndStackApplier.&lt;clinit&gt;(ParallelMoveoutAndStackApplier.java:15)
Could not find the main class: com.nanoseis.ssa.ParallelMoveoutAndStackApplier.  Program will exit.

What I figured was that the old OpenCL .so(s) for the old card were left behind after instillation and it was picking those up instead of the new .so(s) (I did uninstall the older driver before installing the new driver). It looked like that might be the case:

[root@sockeye zack]# ls -l /usr/lib64 | grep -i opencl
lrwxrwxrwx   1 root         root       26 Dec  8 15:23 libnvidia-opencl.so.1 -> libnvidia-opencl.so.375.20
-rwxr-xr-x   1 root         root  8646792 Dec  8 15:23 libnvidia-opencl.so.375.20
lrwxrwxrwx   1 root         root       14 Dec  8 15:23 libOpenCL.so -> libOpenCL.so.1
lrwxrwxrwx   1 root         root       16 Dec  8 15:23 libOpenCL.so.1 -> libOpenCL.so.1.0
lrwxrwxrwx   1 root         root       37 Dec  8 15:48 libOpenCL.so.1.0 -> libOpenCL.so.1.0.0
-rwxr-xr-x   1 root         root    26328 Dec  8 15:23 libOpenCL.so.1.0.0

However when I attempt a simple fix by changing the link using “ln -s -f /usr/lib64/libnvidia-opencl.so.375.20 /usr/lib64/libOpenCL.so.1.0” and then try running my program I instead get this:

[zack@sockeye ~]$ OpenCLmovestack 8000 2000 8000 2
runOpenCLOnly() startIndexOut=800 endIndexOut=7200
samplesPerTrace: 8000 nTracesIn: 2000 nTracesOut: 8000
java: relocation error: /home/zack/my_prowess_home/sys/linux64/lib/libcom_nanoseis_ssa.so: symbol clGetPlatformIDs, version OPENCL_1.0 not defined in file libOpenCL.so.1 with link time reference

Here is looks like my program at least starts to run (because the first two lines are from my program) however it then crashes saying version OPENCL_1.0 is not defined. Huh?

Also if I do “cat /etc/OpenCL/vendors/nvidia.icd” I get “libnvidia-opencl.so.1”, so at least that’s pointing to the right one?

So two things:

Why does it seem want to use libOpenCL.so.1.0.0 over the libnvidia-opencl.so.375.20 despite what the .icd contains? I am correct in thinking that it should be using libnvidia-opencl.so.375.20, right?
What’s the issue with the second with the OPENCL_1.0 error? Why is it even trying to find version 1.0? I would hope my program is using OpenCL 2.0. Currently I compile the code on a different machine then I run the code on, but that shouldn’t matter?

zadi0648 · December 8, 2016, 4:10pm

New info: So I saved the old opencl .so from the old graphics card, I tried replacing the new .so with the old one to see what happens. This is what I get:

    [zack@sockeye ~]$ OpenCLmovestack  8000 2000 8000 2
    runOpenCLOnly() startIndexOut=800 endIndexOut=7200
    samplesPerTrace: 8000 nTracesIn: 2000 nTracesOut: 8000
    java: relocation error: /home/zack/my_prowess_home/sys/linux64/lib/libcom_nanoseis_ssa.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference

So it’s like the second error… except now OPENCL_2.0 is undefined. Huh? (Also I understand that clCreateCommandQueueWithProperties is depreciated, but even we I remove that I still have the version error)

Also some additional information that I forgot: when I first installed the driver I could actually run and complete the OpenCL code! It however ran slower than I excepted. So this is getting stranger, because I expected what I did above to work except be slow like before… what .so could I have been using initially then?

zadi0648 · December 12, 2016, 3:21pm

I thought that it might be the case that Java JNI (which I use to start the native c code and compare the results to make sure they are correct) was causing the problem. It was not for my c standalone version of the code also has the same issue:

ParallelMoveoutAndStackApplier: /usr/lib64/libOpenCL.so.1: version `OPENCL_2.0' not found (required by ParallelMoveoutAndStackApplier)

Anyhow, I have some additional questions about OpenCL shared libraries work. It is case that running OpenCL on a AMD GPU, on a NVIDIA GPU, and on the CPU would all require different shared libraries? Correct? And if so, are there different .so(s) for different NVIDIA gpus, for example? Or is there a one-size-fits-all shared library for all NVIDIA gpus?

zadi0648 · December 13, 2016, 11:54am

So making sure I wasn’t using any methods that are defined only in OpenCL 2.0 was the solution. So the real question now is: why did my new graphics card and the newest driver come with old OpenCL .so(s)? Is OpenCL 2.0 only for AMD?

HadrienG · December 14, 2016, 11:30pm

You can also use it on Intel CPUs and GPUs, it’s more NVidia that is the problem here.

You see, NVidia have a proprietary alternative to OpenCL to sell, which as a matter of fact served as inspiration to the OpenCL spec. So they are not particularly happy about this open standard, and would rather have you write your code that only runs on their gear. Net result: their OpenCL implementation and toolchain are awfully obsolete and ill-maintained.

As for your other questions: in theory, all cl.h and libOpenCL.so implementations are equivalent, and can be used interchangeably with all hardware, thanks to the ICD mechanism. There even are vendor-agnostic open-source implementations of these around. But in practice, the compatibility is not so great, and you may want to try several of them to see which works best (you can safely install them side by side and use build system flags to pick one). From memory, in my last project, I managed to get something running on AMD CPUs and GPUs and Intel CPUs and MICs with the same libOpenCL (AMD’s I think), but NVidia GPUs stubbornly refused to work until I used NVidia’s libOpenCL. But I’m not sure about it, it was a while ago.