97.44 multisample framebuffer_blit still inverted

Is this bug being worked on? Occurs on an 8800GTX and 1500M.

Yes. It will be fixed in a future driver release. Thank you for taking the time to produce a simple repro app, this greatly helped us.

This issue is caused by a problem that could sometimes happen when doing a downsample blit directly into the window. To work around the issue on the current drivers, you could modify your application to perform the downsample blit into a second single-sample FBO, and then do a 1:1 blit from the single-sample FBO to the window.

Thanks for acknowledging this. Is a 1:1 inverted blit slower than a regular 1:1 blit?

Here’s what I’ve been using as a workaround, which looks exactly like what you suggested:

GLuint drawFramebuffer = 0;
glBindFramebufferEXT( GL_DRAW_FRAMEBUFFER_EXT, drawFramebuffer );
CHECKGL;

glBindFramebufferEXT( GL_READ_FRAMEBUFFER_EXT, rt.fbo->_handle );
CHECKGL;

GL_CHECK_FRAMEBUFFER_STATUS( GL_FRAMEBUFFER_EXT );
CHECKGL;
GL_CHECK_FRAMEBUFFER_STATUS( GL_DRAW_FRAMEBUFFER_EXT );
CHECKGL;
GL_CHECK_FRAMEBUFFER_STATUS( GL_READ_FRAMEBUFFER_EXT );
CHECKGL;


const int srcWidth  = appWindow.width;
const int srcHeight = appWindow.height;


const bool flip = GetBool("r_postProcessFlipFBBlit") ? true : false;
const float scale = GetFloat("r_postProcessScaleFBBlit");

#pragma warning( disable : 4244 )
const int dstWidth  = (float)appWindow.width  * scale;
const int dstHeight = (float)appWindow.height * scale;
#pragma warning( default : 4244 )

const GLenum filtering = GetBool("r_postProcessFilterFBBlit") ? GL_LINEAR : GL_NEAREST;


{
	glBlitFramebufferEXT( 0, 0, 
						  srcWidth,  srcHeight,
						  
						  0, 0, 
						  dstWidth,  dstHeight,
						  GL_COLOR_BUFFER_BIT, filtering );
}





if( flip ){
	glBindFramebufferEXT( GL_READ_FRAMEBUFFER_EXT, 0 );
	CHECKGL;
	glBindFramebufferEXT( GL_DRAW_FRAMEBUFFER_EXT, g_rb1.fbo->_handle );
	CHECKGL;

	glBlitFramebufferEXT( 0, srcHeight, srcWidth, 0, // reverse Y
						  0, 0, dstWidth, dstHeight,
						  GL_COLOR_BUFFER_BIT, filtering );

	glBindFramebufferEXT( GL_READ_FRAMEBUFFER_EXT, g_rb1.fbo->_handle );
	CHECKGL;
	glBindFramebufferEXT( GL_DRAW_FRAMEBUFFER_EXT, 0 );
	CHECKGL;

	glBlitFramebufferEXT( 0, 0, srcWidth, srcHeight,
						  0, 0, dstWidth, dstHeight,
						  GL_COLOR_BUFFER_BIT, filtering );

	CHECKGL;
}

Originally posted by CatAtWork:
Is a 1:1 inverted blit slower than a regular 1:1 blit?
As far as I know, a 1:1 inverted blit should always run at the same speed as a 1:1 non-inverted blit.

Here’s what I’ve been using as a workaround, which looks exactly like what you suggested
Almost. In spirit these have the same semantics. In practice it looks like your snippet does one more blit than I had in mind.

    glBindFramebufferEXT( GL_READ_FRAMEBUFFER_EXT, rt.fbo->_handle );

    if( flip ) {
        glBindFramebufferEXT( GL_DRAW_FRAMEBUFFER_EXT, g_rb1.fbo->_handle );
    } else {
        glBindFramebufferEXT( GL_DRAW_FRAMEBUFFER_EXT, 0 );
    }

    glBlitFramebufferEXT( 0, 0, srcWidth, srcHeight,
                          0, 0, dstWidth, dstHeight,
                          GL_COLOR_BUFFER_BIT, filtering );

    if( flip ){
        glBindFramebufferEXT( GL_READ_FRAMEBUFFER_EXT, g_rb1.fbo->_handle );
        glBindFramebufferEXT( GL_DRAW_FRAMEBUFFER_EXT, 0 );
	glBlitFramebufferEXT( 0, 0, srcWidth, srcHeight,
                              0, 0, dstWidth, dstHeight,
                              GL_COLOR_BUFFER_BIT, filtering );
    }

Oh, I see, the inversion only happens when blitting from a multisample FBO to the window, not from multisample to single-sample.

I’ve added the two-blit path, but it’s significantly slower! Looking into it now.

I’m not sure why the 2 blit approach is slower than 3, but here’s another repro app.
http://www.effloresce.com/cat/opengl.org/fbo_blit_perf-20061213.zip
It’s 12megs, because I didn’t have a whole lot of time to prune.

I would maximize the window to something large, 1600->1900 width hopefully.

My only thought is that after the maximization the allocation order of the framebuffers is not optimal. They’re created when r_postProcessEnable 1 is executed, not at the beginning of the gl context creation.

r_postProcessMultisamples X, (I used 8 and 16)
r_postProcessFlipFBBlit 1, ( enables a flip in the 3 blit path)
r_postProcessEnable 1

r_postProcessAllowFBBlit 2, for the 3 blit path that I posted

r_postProcessAllowFBBlit 3, for your path, Jeff.

r_timeGL 1 for EXT_timer_query -based FPS.

ocean_useShader 1 to perform some heavy per-pixel work.

image_anisotropic 8 or 16 to get rid of the texture2DProj artifacts at a distance