FBO and custom reverse mipmap generation

Timothy_Farrar · November 29, 2007, 10:35am

For pyramid method image processing on the GPU where you have a reduction pass (works like custom mipmap generation) and an expansion pass (works like reverse custom mipmap generation), what is the proper way to do the expansion pass using the same texture?

The reduction (ie FBO custom mipmap generation) works as expected,

for(level=1,level++) {
glFramebufferTexture2DEXT(,level);
glTexParameteri(, TEXTURE_BASE_LEVEL, level-1);
glTexParameteri(, TEXTURE_MAX_LEVEL, level-1);
… draw … }

However when trying to do the reverse,

for(level=max,level–) {
glFramebufferTexture2DEXT(,level);
glTexParameteri(, TEXTURE_BASE_LEVEL, level+1);
glTexParameteri(, TEXTURE_MAX_LEVEL, level+1);
… draw … }

FBO status is GL_FRAMEBUFFER_UNSUPPORTED_EXT.

Shouldn’t this be something supported by the hardware and thus a driver bug, or is the unsupported status correct?

For those trying to think of what this type of construct is useful for, think about multi-level bilateral up-sampling of reduced frame size computations…

Jan · November 29, 2007, 1:08pm

You lost me at “multi-level”…

Timothy_Farrar · November 29, 2007, 5:10pm

I probably should have said multi-resolutional (level being a mipmap level).

Basically you separate a computation by resolution, do computations at each resolution (ie mipmap level), then starting with the smallest mipmap level, bilateral up-sample and combine with the next larger mipmap level, until you reach the largest level and then you have the results.

Useful for all sorts of image based stuff, like say a combination of both local and global contrast enhancement.

Jan · November 29, 2007, 5:27pm

Ah, now i understand much more. Sounds interesting. Sorry that i can’t help you, though, i never tried working on mipmap-levels with FBOs.

On the other hand, looking at your code makes me wonder:

In the first loop you do “level-1”, so you are basically cycling through the levels like this: 0, 1, 2, 3, … , max-1

In the second loop you do “level+1”, so you are cycling through them like this: max+1, max, max-1, max-2, … , 1

At least if i am not wrong (it is half past 2 in the morning, right now). Shouldn’t that be an error? Shouldn’t the second loop use “level-1”, too?

Jan.

Timothy_Farrar · November 30, 2007, 7:52am

Yep you are right, there is a typos in my post, loop should start at level=(max-1).

The concept is,

level -> attached to FBO (write to)
level+1 -> mapped to texture unit via level constrains (read from)

Writing to a level larger than the read level is what causes the problem, even when the texture level is properly restricted such that there is no read-write to the same level.

Nicolai_de_Haan · November 30, 2007, 9:42am

Uhm, maybe I misunderstood your approach but shouldn’t all attachments have the same size in texels (to be FB complete)? When expanding why is the texture you’re expanding from, attached in the FBO?

Timothy_Farrar · December 1, 2007, 12:04am

The idea is to attach only one level at a time (one attachment), each iteration of the loop replaces the attachment with another level.

bobdobbs · December 4, 2007, 12:05am

What sort of interpolation do you want for the build back up?

Since you can set the maximal filtering to be bilinear(ie GL_LINEAR) you can get that for free by rendering a mipmap level to a larger window frame with a fragment shader program. ie set glViewport to the desired size and then use the shader to force the correct level of detail.

If you wanted to do something more than that you could use multitexturing to bind the levels of the pyramid you need to quads of the right size to force the LoD to be the desired levels and then use a fragment shader to pull out the desired image points and do whatever interpolation you want to them. Something along these ideas at least.

I just did something like that and it seems to be working alright, though I need to do more testing to make sure that everything is numerically correct but visually the data seems alright.

Don_t_Disturb · December 4, 2007, 4:03am

I’m experimenting with custom mipmap generation myself right now - I’ve found that compared to reading and rendering from the same texture, it’s about twice as fast to render into a separate renderbuffer then use glCopyTexSubImage2D to copy that render into the texture.
That’s on a GeForce 8800 GTS with driver version 158.19.

bobdobbs · December 4, 2007, 7:28am

Really? I was under the impression the glTexCopySubImage command would be slower since you have to do all the same set up as the FBO and then a data copy… is the bind to the FBO for mipmap sublevels really that bad? Er… need coffee… let me ask that better…

What is the bottle neck in rendering to the same textures mipmap levels(ie bind the level to the fbo and then bind the texture to read from ) directly that makes doing the data copy faster?

Cheers.

Don_t_Disturb · December 5, 2007, 3:01am

Yeah, it seems to be the texture binding that slows things down. I did get a tiny improvement (i.e. 102fps vs 100fps) in my test app by rendering mipmap level 2 (which was 512 * 256) directly into the texture, and using CopyTexSubImage2D for subsequent levels, so I expect the ideal solution will be a combination of both.

Timothy_Farrar · December 6, 2007, 7:37am

Thanks for sharing that info!

Interesting, are you rendering to just one “flat” renderbuffer in different areas for each of the mipmaps, then for each mipmap, using glCopyTexSubImage2D() to copy from the same single renderbuffer? (situation being that you don’t have to read each layer to generate the next layer)

Or are you writing to the same renderbuffer each time you build a mipmap layer and doing a glCopyTexSubImage2D() to the texture before processing the next layer? (situation being that you need to read from each layer to do the next layer)

One thing I tried before is to ping pong between 2 textures on the up pass. Meaning,

draw to Tex A layer N
draw to Tex B layer N-1, read from Tex A layer N
draw to Tex A layer N-2, read from Tex B layer N-1
…

Interesting workaround which easily causes a reproducible hard lockup in Linux using the NVidia drivers. Probably a race condition in the driver. Which hints to why the renderbuffer method could be faster, either the texture binding, change of texture min/max levels, or FBO attaching is causing the driver to have to wait on something.

Also given that custom mipmap generation has a bunch of logical join points (ie all parallel processing on the GPU must finish prior to starting the next mipmap layer which depends on the results of a previous layer), I wonder if you would get a good speed up sending some other work to the pipeline between generation of mipmap layers?

Don_t_Disturb · December 6, 2007, 8:32am

That one.

Good point about parallelising, I might give that a try (don’t hold your breath though!)