Accurate Colour Picking

Hey there!

I’ve been playing around with picking a lot recently, both raycasting and color-picking. After success in both of these areas, I wanted to try and make these algorithms faster and more effective.

After a quick brainstorm, I have an idea that probably won’t work but I guess is worth a shot. My idea is to the rgb spectrum to the xyz coordinates (with a shaders like this: https :// ), and then use getPixels or something to return the rgb value (and therefore the xyz coords). I’ve found a way to differentiate between the positive and negative axis (with the alpha) so that isn’t a worry.

What I need to ask is whether this is a potentially viable idea? are there any nuances with the colours in glsl that would make it innacurate?

I’m pretty new to all this so help would be greatly appreciated

In my experience the primary goal of picking is to identify the “object” (whatever the application defines as such a thing) under the mouse cursor, not so much the coordinates the mouse is over. If all you want is the latter then that can be obtained from the depth buffer, the window space x,y mouse coordinate and the viewport, projection, and view transformations: use the mouse coordinates to look up depth, apply the inverse of the three mentioned transformations and you can get world space xyz.

If you want to compute coordinates in a space that can not easily be recovered (e.g. because it is not the same for every object in the scene, like object space) you could render to a 3 channel floating point texture that stores those coordinates. In general, these days it is not really necessary any longer to squeeze information into 8bit RGB(A) channels, there are full precision floating point and integer formats that allow storing data in a more natural way.

1 Like

The default framebuffer only has 8 bits per component. A framebuffer object has a wider selection of formats, up to a 32-bit float per component, depending upon the OpenGL version. OpenGL 3.0 incorporated the GL_ARB_texture_float extension into core, but that isn’t available in OpenGL ES 2 or WebGL.

But as Carsten says, you can recover the coordinates from the window coordinates and depth buffer value by inverting the various transformations (model-view, projection, projective division, viewport transformation). That’s likely to be more accurate; the depth buffer will be at least 16 bits and more often 24 bits, although using too low a value for the near distance will reduce accuracy.

1 Like

Thank you for your advice @carsten_neumann , I really didn’t consider (or really know of) depth buffers until recently and I think this would be perfect as I already have the transformations that I would need (from my ray-casting code). This seems like a pretty optimal solution in my case!


Thanks, @GClements

I kinda understand that my idea was kinda warped :)).

I think it would be cool to try a 32-bit/component framebuffer, it probably would be a bit overkill for something that can be done more simply. Thanks, soooo much for your advice!!!


Hey @carsten_neumann. I am struggling to understand how I can extract data from the depth buffer. The most obvious way that I can think of would be to parse the data from the depth buffer into the fragment shader like in this example.
https:/ /

Then the data can be read with a glReadPixels. Is this how you would do it or is there a better way (because i heard readpixels can be slow). I did read somewhere about momentarily changing the viewport to that single pixel is more effective but I don’t really know.

What would you recommend?

Yes, you can use glReadPixels to read values from the depth buffer back to CPU memory. For picking you don’t need the entire depth buffer, just the value under the mouse cursor, so you can restrict the transfer area to only that pixel.
The reason glReadPixels is “slow” is mostly because of the pipeline stall it introduces. Drivers queue up a lot of work for the GPU to process, in order to keep their processing units occupied. When you read back data to CPU memory the driver has to wait for all queued up work to complete before it can return from the call to glReadPixels (or other function that transfers data in a similar way). Now the GPU’s queue is empty and the driver has to start filling it again, which takes time during which the GPU is not fully utilized.
I guess a way around the pipeline stall would be to copy from the depth buffer to one of a rotating set of buffers (say 3 or so) and then at a later frame transfer from a buffer that was filled some frames ago to main memory. I think this might avoid the stall, but I don’t know for sure. Anyway, it is not what I’d start with. As usual with (graphics) programming start with a simple implementation and if it is a bottle neck (as determined by measuring) improve it until something else is the bottleneck or everything is running fast enough :wink:

To avoid a pipeline stall, bind a buffer object to GL_PIXEL_PACK_BUFFER before calling glReadPixels; the data will be copied to the buffer object rather than to client memory. You can use a fence (glFenceSync) to determine when the the transfer has actually been performed, then read the data from the buffer object. This requires OpenGL 3.2 or OpenGL ES 3.0. With OpenGL ES 2, there’s no reliable way to avoid pipeline stalls; the best available option is to have a circular list of render targets (textures or renderbuffers) and wait until the target is about to be re-used before reading it (this means that you have a few frames of latency; you always want at least one frame of latency because picking should be based upon the frame which was actually displayed on the screen at that point, not the frame which was being rendered).

If you only want the value at the cursor position, then you only need to read one pixel. If you’re performing a render pass solely for the value of a single-pixel, you only need to render that pixel. So you can use a 1x1 framebuffer, or set the viewport to cover a single pixel. In either case, don’t forget to update the projection matrix accordingly. You want the same overall transformation as if the viewport covered the full framebuffer, i.e.
V·P=Vf·Pf => P=V-1.Vf·Pf
where V is the viewport transformation for the 1×1 viewport, P is the projection matrix used to render to the 1×1 viewport, Vf is the viewport transformation for the full viewport and Pf is the projection matrix used for the full viewport.

The viewport transformation is for a given rectangle x,y,w,h is

[w/2   0   0 x+w/2]
[  0 h/2   0 y+h/2]
[  0   0   1     0]
[  0   0   0     1]

while its inverse is

[2/w   0   0 -2x/w-1]
[  0 2/h   0 -2y/h-1]
[  0   0   1       0]
[  0   0   0       1]

Their product is

[w1/w2     0   0 (2(x1-x2)+(w1-w2))/w2]
[    0 h1/h2   0 (2(y1-y2)+(h1-h2))/h2]
[    0     0   1                     0]
[    0     0   0                     1]

where x1,y1,w1,h1 is the full viewport and x2,y2,w2,h2 is the restricted viewport.