Originally posted by Korval:
[QB] [quote]o, the primary problem is the drivers since reading back a dumb block of memory should NOT be slow when you match formats.
So, explain precisely how the driver should fix the problem of the bus across which the data is being transfered being excruciatingly slow? Not to mention the fact that said bus does not allowing bidirectional data transfer, thus every glReadPixels call will provoke a glFinish().
[/QUOTE]The reason why glReadPixels implies a “glFinish()” is unrelated to the bidirectional data transfer.
If your graphics card has proper support for glReadPixels, the only reason the driver needs to sync the card (“glFinish()”) is because of the synchronous nature of glReadPixels in OpenGL spec. The application must have the data available when at glReadPixel call return time.
What is really necessary is an asynchronous glReadPixels, it’s of little use to have the fastest readback bus in the world if you have to wait idle until the current rendering has finished and your data has returned to the CPU.
By pipelining glReadPixels calls, you should be able to hide most of your latencies.
This is not a driver problem.
[quote]I agree about glDrawPixels though; there seems little reason for that being slow.
My guess with glDrawPixels is that it can be implemented in 2 ways.
One is to directly write pixels to the framebuffer. This, among other things, requires a full glFinish. Also, this probably violates the GL spec because it probably says that per-fragment and post-fragment processing happen on glDrawPixels as well as other methods of rendering.
[/QUOTE]Well that shouldn’t be a problem, because a glDrawPixels is treated as a point wrt texture sampling and color interpolation, so the whole quad gets the same color & texel values.
Anyway, directly writing things to the framebuffer is very bad, and that’s why a function like glDrawPixels - contrary to what you think - is good, because it abstracts the app from the underlying video memory layout and its use doesn’t force a pipeline flush (unlike a buffer “lock”).
The other is that, to do a glDrawPixels, they have to internally create a texture (memory alloc. Not fast), load it with your data (memcopy. Also not fast), change a bunch of per-vertex state so that they can draw a quad (state change), draw the quad, and then change the state back (state change).
There’s a third method and is that the graphics card supports drawpixels natively, where the pixel data is supplied as fragment data, in the same way a graphics card supports “texture downloads” (those seem to be “texture uploads” for non-driver people).
glDrawPixels (or glReadPixels, for that matter) has never been a priority for consumer cards, that’s why you don’t find “fast” implementations of those, but I’m sure you can find them in workstation class boards (DCC applications like Maya perform tons of glDrawPixels/glCopyPixels).
On the other hand, the second method doesn’t need to be slow at all. You don’t need to allocate the texture everytime, you can use a scratch texture, or even a pool of them if you want to be able to pipeline glDrawPixel calls. Loading the texture with the data is a data transfer that you have to do anyway (even in the native-support case), and the state juggling & drawing a quad with that texture once it’s in video memory is fast.
Ultimately, glDrawPixels is just not a good idea. Hardware’s designed for drawing stripped triangles, not arbitrary bitmaps from main memory.
I don’t agree with that, in fact I believe that glDrawPixels is a great tool to avoid having to “lock” framebuffers around and guess which is the format things are really stored into or forcing the hardware vendors to implement a given memory layout.