Complicated stencil logic

I have a problem involving tests against a stencil buffer. There is an obvious but potentially very inefficient solution which I’ll explain below, perhaps you know of a trick that can help me. I’ll put a brief explanation of why I’m doing this at the end.

Problem
I am using multiple bits within a stencil buffer, which has been filled with a draw call per bit, the colour buffer is so far empty. Then I have a list of objects I want to draw. For each object I compute a list of the stencil values I want to test positive, obviously if this list is empty then I skip rendering and if this list is a singleton then I can do a single test GL_EQUALS against the value, if the list is longer then I need to perform multiple draw calls testing for each stencil value. This is easy to implement but inefficient.

For example if I am using 3 bits then for a particular object I might want to test for one of { 000, 010, 101 }, so I use a draw call for each. However if I am using 8 bits and I want to test against half of them then 128 draw calls for a single object in a scene is not feasible.

I have a solution if I am using 3 bits (or fewer): after writing to the stencil buffer for the first time I create a new stencil buffer using all 8 bits with a bit for each of the 8 possible values of the original buffer. This can be created by drawing a full screen quad for each of the 8 values. Then testing for the set { 000, 010, 101 } is as simple as using the mask 10100100. But this uses all 8 bits of the stencil buffer.

Are there any other stencil test tricks I am missing?

Motivation
I am implementing the kinds of geometry usually studied in the field of algebraic topology, they’re called covering spaces. Superficially they look a bit like the various tricks using portals. I’ve only come across one true implementation before: in a game called Parallax, but it’s the most simple version possible. The game Antichamber also has some similar tricks. The great thing about covering spaces is that they satisfy nearly all the properties of standard Euclidean geometry, so most methods in a standard 3D engine are possible: lighting, shadows etc. The only extra computational burden is a) some CPU based group theory needed to simulate the objects and to compute the lists of stencil values referenced above and b) some stencil tests similar to the ones above.

If you’re using the stencil buffer as a mask but not updating it, you could use an integer texture and just discard fragments for which the test fails (although discard disables the early depth test optimisation). That would avoid the need to expand the stencil values to bitmasks.

With the introduction of image atomic operations, it should be possible to replace any or all of the read-modify-write framebuffer operations (depth, stencil, blending) with image operations and do everything in the shader, albeit at some cost to performance.