glReadpixels speed problem on nvidia card

OK… here is an update…
Using PBO for readback have some advantages. First… create two PBO buffers for readback and after finish rendering use previous PBO to map buffer and copy result to sysmem and current PBO to initiate readback (call glReadPixels). As I understand PBO, glReadPixels will be posted in command queue and it will be executed when GPU finish rendering.

static void render_scene2() 
{
	// readback current frame
	BindBuffer(GL_PIXEL_PACK_BUFFER_ARB, g_pbo[currentBuffer]);
	GLCALL(glReadPixels(0, 0, g_width, g_height, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, 0));

	// copy prev frame to system memory
	BindBuffer(GL_PIXEL_PACK_BUFFER_ARB, g_pbo[prevBuffer]);
	void* data = MapBuffer(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB);
	// do something with data
	memcpy(system_memory, data, g_width * g_height * 4);
	UnmapBuffer(GL_PIXEL_PACK_BUFFER_ARB);

	BindBuffer(GL_PIXEL_PACK_BUFFER_ARB, 0);

	int tmp; 
	tmp = currentBuffer; currentBuffer = prevBuffer; prevBuffer = tmp;
}

I did benches for this code… on my machine P4 dual core 3.0Ghz + GF7600GT-PCIX(fw 96.89) + WinXP SP2 at resolution 1280x1024 app can render and readback ~60fps and overall CPU usage is ~60%. When I do benchs in smaller resolution CPU usage increase a bit but FPS got 100% boost. At 640x480 fps increase to ~140.

hi andras! i run your program but it crashes and windows close it… did I missed something ?
Hmm, that’s strange, I’ve tested it on multiple computers and it run fine. It’s also a really simple program… I’ve included the source and project files, could you build it in debug, see where it crashes? Thanks.

EDIT: I made the test app even simpler, and added some error checking, in case the PBO is not supported.

I did benches for this code… on my machine P4 dual core 3.0Ghz + GF7600GT-PCIX(fw 96.89) + WinXP SP2 at resolution 1280x1024 app can render and readback ~60fps and overall CPU usage is ~60%. When I do benchs in smaller resolution CPU usage increase a bit but FPS got 100% boost. At 640x480 fps increase to ~140.
I think 60% is waay to high for such a simple app! And it stays high, even if I read one pixel. Insert this line into main, to see how it goes up to 100% on a single core:
SetThreadAffinityMask(GetCurrentThread(), 0x00000001);

BTW: I didn’t know you were double buffering the PBOs, that makes sense. In my original code, I had a ring of PBOs.

http://developer.nvidia.com/object/fast_texture_transfers.html

http://developer.nvidia.com/object/fast_texture_transfers.html
Yooyo, I’m already doing this. When I said I use a ring of PBO’s, I meant the exact same technique that’s at the very end of this document (page 5) titled “Map Different Frames to Different PBOs”.

Also, this is the theory, and I understand how it should work. But in practice it doesn’t! It says that reading into a PBO is asynchronous, but in practice, it is not! You can see in the test program, even when I read into a PBO, it does not return instantly, but instead stalls the CPU!

Interresting benchmarks from GPUBench:

http://graphics.stanford.edu/projects/gpubench/results/

We see that a Radeon X1900XTX can do glReadPixel at a rate of about 200MB/sec, … while a nVidia 7900GTX or 8800GTX will sustain more than 800MB/sec. That’s a 4:1 performance ratio!

I would like to hear from ATI as to why they perform so badly.

Ozo.

I would like to hear from ATI as to why they perform so badly.
And I would like to hear from nVidia as to why reading into PBOs block! :slight_smile:

Originally posted by andras:
And I would like to hear from nVidia as to why reading into PBOs block! :slight_smile:
Andras, please try requesting 8 AlphaBits in your pixelformat.

Originally posted by jeffb:
[quote] And I would like to hear from nVidia as to why reading into PBOs block! :slight_smile:
Andras, please try requesting 8 AlphaBits in your pixelformat.
[/QUOTE]Hah, that did the trick! It is much faster now! CPU usage is at 1% while reading back every frame!

Thanks a lot! I owe you a beer! :wink:

Wow… I never checked your pixelformat (headbang). Now… here is some rules:
pixelformat(colorbits:alphabits) and glReadPixels params
32:0 - (GL_BGR, GL_UNSIGNED_INT_8_8_8_8_REV)
32:8 - (GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV)

I’ll check later readback from FBO performances. Generally, backbuffer or FBO should have pixels size 32bit*N.

Oh, wait! This works nice with the color buffer, but how do I read the depth efficiently?

EDIT: You also have to use 32 bit BGRA, reading back BGR seems to throw an OpenGL error (INVALID_OPERATION), when using GL_UNSIGNED_INT_8_8_8_8_REV (it works otherwise, but slowly).

I think I’ve just found it! This seems to work:
glReadPixels(0, 0, 1, 1, GL_DEPTH_STENCIL_EXT, GL_UNSIGNED_INT_24_8_EXT, 0);