Question about OpenCL for FPGA

jefflieu · January 27, 2013, 7:46pm

Hi guys,
I’m an FPGA developer. Would like to know more about this OpenCL thingy. Altera’s rolling out their own SDK to support OpenCL on Altera’s FPGA. However, they’re very secretive about this program ?!?!

I don’t know anything about OpenCL technology. I Know that it defines certain APIs to communicate between the host and devices.

So, let’s say i have an FPGA board connected to host via PCIe. I have an application and would like to have it run on this heterogeneous system using OpenCL.

Questions:

what are the missing parts that OpenCL doesn’t define and they’re left to system vendor? I suppose Altera is doing these things? PCIe drivers, libraries …
I suppose these should all be the same for all FPGAs. Is there any open-source project for this purpose: infrastructure for both FPGA sides and host sides?

Thanks for your comments/opinions,
-Jeff

clint3112 · January 28, 2013, 5:08am

I think the most imortant thing the vendor gives you is the compiler.

jefflieu · January 28, 2013, 6:29pm

Hi,
I thought the generic compiler with opencl support will do?
Vendor only provides the runtime libraries? Isn’t it?

Thanks
-Jeff

matthiasv · January 28, 2013, 11:31pm

There is no generic compiler nor a standardized bytecode. Each vendor must provide a compiler not only because of this but also because OpenCL sources are compiled at run-time. But on the other hand, you can think of an OpenCL compiler as part of a vendor’s run-time library.

jefflieu · January 29, 2013, 7:02am

Hi Matthiasv,
Thank you for clarifying.

Sorry if i asked dumb question/assumptions because I still don’t get the “big picture” of the whole compilation process.
If the vendor provides the compiler, what compiler if i have a platform with accelerators from different vendors? By vendors, I mean Accelerator Vendor: GPU, FPGA.

Any pointer to the concept compilation flow is much appreciated. So far, with internet search I can only find tutorials that focuses on the details, I’d like to understand roughly how the compiler binds the system with host and accelerators together.

Thanks a lot!!!

matthiasv · January 29, 2013, 7:52am

Well, it all boils down to some simple abstractions. You write your OpenCL code in portable OpenCL C. Then you use one of the installed platform objects and create a context, containing as many devices as available (e.g. CPUs, GPUs or in the future FPGAs). Within this context, you compile your OpenCL code using clCreateProgramWithSource. You receive an opaque cl_program pointer, which means you don’t know exactly what the result is*. From this program you extract your individual kernel objects that you launch from the host side via clEnqueueNDRangeKernel.

So, it is more appropriate to think of the run-time as the heart (and glue) of an OpenCL implementation instead of the compiler. The compiled binary gets transferred to devices just like any other data.

Actually, you can dump the result with clGetProgramBuildInfo, but it is not a portable representation of OpenCL code.

jefflieu · January 29, 2013, 5:08pm

Hi Matthiasv,

I think i roughly understand the flow. Just one more doubt.
If my system has 1 NVIDIA GPU and 1 FPGA, then the OpenCL implementation would have a few kernels to be run on GPUs and a few on FPGA. The binaries of those kernels are totally different. I believe the binaries for FPGA have to be compiled offline and loaded to FPGA by totally different tool-chain.

How many compilers are there totally? Are there 3?
– 1) Compiler to compile kernels to be run on FPGA
– 2) Compiler to compile kernels to be run on NVIDIA
– 3) Compiler to compile codes targeting CPU
Am I right? How do they work together?

Thank you!

I have downloaded the NVIDIA and VisualC++ Express. I’ll get some hands-on and hopefully get a better idea. Thank you, Matt!

matthiasv · January 29, 2013, 11:39pm

Hi Jeff,

the number of compilers depends on the number of platforms. For example, the AMD system enumerates its own GPUs as well as all physical CPUs. Thus, you could use clCreateProgramWithSource to build binaries for (AMD) GPUs and the host CPU. But you could tell at startup what is possible because you can query each device for its “underlying” type.

How Altera or Xilinx are handling this, would be on their behalf. You would just call the aforementioned function and the tool chain does whatever is required to upload the compiled binary. And yes, the binaries of different platforms (or even different devices) cannot be interchanged.

What is important: you are responsible on host side to transfer data between different platforms and synchronize potentially asynchronous operations.

jefflieu · January 30, 2013, 4:18pm

Thank you !
Let me have some hands-on and bug you further!
Thanks!
-Jeff

ramkumarkoppu · February 15, 2013, 3:56pm

Hi,
You can find more about OpenCL implementation for Altera FPGAs here:
http://www.altera.com/products/software … index.html
Currently their target is to support platform consist of Stratix+PCIe+Shared Memory. The expected first release is in April 2013 with Quartus-II. I hope this helps.