As far as I know there is no support for inline assembly in OpenCL; however I would really like some kind of access to 32-bit (and 64-bit if possible) carry/borrow add and sub functions with carry/borrow in and out.

In the meantime, are there any efficient methods to performing such operations without many conditional tests or arithmetic operations?

Also, since there is no operator overloading in OpenCL, developing emulated basic data types (say 256 bit integers or floating-point numbers) is entirely possible but exceedingly cumbersome. Since 128 bit integers (long long) and floating-point numbers (quad) are reserved for possible future inclusion, how does Khronos plan to implement them?

A bit of googling found this presentation: http://dl.fefe.de/bignum.pdf that shows how different arbitrary-precision arithmetic libraries implement bignum.

See slide 6 in particular. Translated to OpenCL it says: for addition use 64-bit integers but compute 32-bits at a time. That way you will have access to carry bits.

```
for (l=0, i=0; i<m; ++i) {
l += (ulong)src1[i] + (ulong)src2[i];
dest[i] = l;
l >>= 32;
}
```

Multiplication is oh so interesting However, there’s no problem here that is new to OpenCL: the same issues would appear in any implementation of bignum that doesn’t use assembly.

Nice link.

I was trying something very similar to slide 6, but since unsigned long long isn’t yet implemented in OpenCL I had nothing to directly catch the overflow. I was trying to do conditional tests but tended to branch too much or repeat the same calculation using the functions hadd or rhadd. The way you pointed out uses a smaller basic data type but is much more concise.

Unfortunately many of those more interesting multiplication algorithms are recursive in nature, but if I’m progressively making 128, 256, and then 512 I can use .hi and .lo and call down the recursion more or less inherently. Tackling division is an even greater joy :).