glGenerateMipmapEXT: very poor performance?

jesusgumbau · July 22, 2009, 7:15am

I need to compute the average of the contents of a 32x32 RGB32F render target texture. I have tried two methods:

MethodCPU: use glReadPixels over the FBO and do the computations on the CPU. Cost: glReadPixels + 1024-iterations loop.
MethodGPU: use glGenerateMipmapEXT to compute the mipmap on the GPU and fetch the 1x1 mipmap for the value which will contain the average of all texels (works fine).

The problem I have is that MethodGPU is 2~3 times slower than MethodCPU. I thought my GPU-based approach would outpeform the CPU-based one, but I doesn’t seem to be true.

My questions are:

1- Which is the fastest way of generating mipmaps for a texture attached as a render target to a FBO?

2- Which is the fastest way of computing the average value of a texture?

Greetings.

dletozeun · July 22, 2009, 7:54am

What hardware are you working on?

Yes gpu method should be much faster is theory, but for practical purposes, it not always true. I tried it on quite old ati hardware and mipmap generation for floating point texture is very slow indeed.

In addition, apperently your hardware supports linear filtering since the result in the latest mipmap is right according to you. But linear filtering on fp textures is supported only on high-end hardware and especially nvidia one. So the success of the gpu approach is hardware dependent…

Actually, I am using for now the cpu method. I savagely copy the frambuffer color output to a tiny texture like 256x256, get pixels and compute the average color in a second thread. This works pretty well if you don’t need a high precision and also the compute average color every frame!

jesusgumbau · July 22, 2009, 8:18am

Thanks for your answer.

I am working on a GeForce 8800GT.

Your thread-based approach seems interesting, but first I’ll try computing the average color with-in a pixel shader. The idea is to draw a single point on the screen and let the pixel shader do the whole 32x32 texture accesses in a loop.

I think it would perform very fast for such a little texture resolution, while allowing the method to be completely GPU-based.

What do you think?

dletozeun · July 22, 2009, 8:49am

Yes, that worth a try indeed. Keep us informed about this.

Alfonse_Reinheart · July 22, 2009, 3:29pm

A 32x32 texture, even floating-point, isn’t very big. It’s no surprise that it’s not very expensive to do a read-pixels on it and then manually do the operation yourself.

Now if it was a 1024x1024 texture, then you’d probably have issues.

jesusgumbau · July 23, 2009, 1:05am

I am realizing that the real bottleneck is not processing those 32x32 elements but retrieving the data from the results from the frame buffer. I am surprised that even trying to get a single pixel with

glReadPixels(0,0,1,1,GL_RGB,GL_FLOAT,data);

has a severe impact on performance. Which would be the correct/fastest way to get a single value from the GPU?

dletozeun · July 23, 2009, 1:50am

glReadPixels alone stall the program until data is completely copied in system memory. Are you reading pixels from your fbo color attachment?
You may try to attach a texture if it not already the case your fbo and call glGetTexImage. However for glReadPixels and glGetTexImage performance may vary a lot depending on pixel format and data type you set. I advise you to experiment different pixel format/data type setup.

jesusgumbau · July 23, 2009, 1:57am

Using glGetTexImage produces much worse performance… it is two times slower than the glReadPixels version…

About the pixel format/data types: I need full floating point precision, so I need to use RGB with 32 bit FLOATs.

dletozeun · July 23, 2009, 2:07am

You may try to get data as UNSIGNED_BYTE then cast the pointer to GLfloat*. It would not have any impact on precision since it is just a pointer cast.
I know that on some hardware using GL_BGRA is faster. I know, there is something about this on the wiki but i can’t find it right now. Also prefer RGBA or BGRA textures instead of RGB textures.

yooyo · July 23, 2009, 6:07pm

If your rendering is complex then after you issue all rendering commands and then call glGenerateMipmaps it goes to command queue. The problem is that you call glReadPixels right after glGenerateMipmaps. Driver must wait until rendering queue is finished before it returns 1x1 pixel.

Because you have NV GPU use GL_NV_timer_query to identify bottleneck on GPU side.