Bitwise ANDing, ORing, amd XORing very long bitmaps


I would like to hook into the additional power of my GPU. My problem is that I must perform bitwise AND, XOR and OR on very long bitmaps. The bitmaps could perhaps be 800 megabits. I would also wish to count bits that were “on” in any bitmap and know their place.

Pardon my ignorance, but is this possible in opencl? I am researching the best method to do this and if it is possible, I will put in enormous effort to learn opencl and will deal with the inevitible brain-ache and laptop rage - I am a BASIC programmer by trade. :slight_smile:

Regards, Peter.

Let’s see if I understand. You have about 100 megabytes of data (800 megabits) and you want to perform some bitwise operations like AND, XOR, OR and population count (number of bits enabled) and you have no previous experience in C. Is this correct?

Under the circumstances above I personally would not bother learning OpenCL. The kind of operations you want to run can be implemented quite easily and efficiently in a CPU using traditional programming methods.

For starters, you may want to implement these algorithms in C or C++. With minor modifications you could also make them work in parallel using OpenMP.

You can certainly do this in OpenCL if you want, but it will take more effort and I’m not sure you see great gains compared to C + OpenMP.

Finally, and in the interest of full disclosure, notice that OpenMP always runs on your CPU while OpenCL runs on either your CPU or your GPU (or both) – the choice it’s up to you.

Thank you.

I am glad that opencl can cope. Your assumption about the amount of data is slightly incorrect. Each of the 800 megabits corresponds to a record number, and there may be many bitmaps to manipulate per dataset. I would expect to perform 100 or so bitmap ANDs per search. I need some serious parallel power to do what I need with the large amount of data I have, and with a response time of under a second.


The problem I see with the sort of data processing you need is that the ratio of ALUs versus the amount of memory that needs to be transferred to the device is very low for this to run efficiently on a GPU.

I would either try with OpenMP or OpenCL on a CPU. Good luck :slight_smile: