Global memory "bitfield" technique

My main question is if it is possible to create bit-field in global memory.

Now I know for various reasons Khronos group rejected bit-fields, so only way is to “emulate” them with bigger data types. The most suitable is probably byte (uchar), so I would use byte-field as storage, but treat it as bit-field with some bitwise magic.

The problem is how to access that data storage correctly. Because there’s no synchronization at global level, I can’t read & write to same buffer in single kernel run… it could mess the data (At least I think… I may read the value and right away other item would write there, making my recently read item “expired”). So reading the byte, writing at desired bit and writing byte back could result in error.

I can come up with only one solution - to write only that one bit and leave others untouched, but I don’t think that’s possible in current hardware.

Of course one can simply use bytes as bits, but with small memory sizes and even more limited buffer sizes, their number may be insufficient.

Any suggestions? I would also appreciate if someone has different idea how to pack results that consist only of “true” and “false”.


Yeah I had the same problem, I think it depends on the problem that you’d like to solve.
I wrote a program which used Schönhage’s algorithm to calculate the factorial of a number
in the fastest way. Using the long16 number and tricking with bits was a nice for this problem, but
I don’t think that would fit your problem because mine wasn’t so good at “parallelizing” so I could wirte my own “bitfield”.
As you said working bad with the data can cause inconsistency, so it’s really hard to use bitfield technique.

I agree. If it would be in scenario where each run index map to one bit in such way that { idx0 -> bit0, idx1 -> bit1 } I could use what I’ve already done… Specify local workgroup size as multiple of 8 (which is needed for performance reasons anyway with sizes multiples of 32), and appoint first processor of that octet to build whole byte from bits and save the result into memory.

However, now I need something that would work with random access too. Starting with elements initialized to false, only requirement is that when a bit is set to true, it won’t change it’s value again.

I think without global synchronization (which would be anyway unsuitable… just imagine the access pattern) or more grained access to level of bits this is impossible, but maybe people with more experience bypassed this with other solution… other than changing data-structure ofc.

At last I have found solution to this: … ic_or.html

I had in memory that I read about usefulness of atomics while operating with global memory, but haven’t investigated this possibility until now.

global atomics are really slow on some hardware, and even if they weren’t - if you’re doing a lot of them it can still be a real bottleneck.

It might be faster to write to the smallest atomic write size (which might not be byte, although you could always use a byte image), and then have a subsequent pass which compresses it back to bits - and store them at least using 32-bit ints.

Or if the data is really sparse, create an edit list which is executed sequentially afterwards.