Summed Area Tables in GLSL

Hi guys. I’ve been playing so much with Shadows these days, trying to get it right for a client. Im looking at Summed Area Variance Shadow maps and following GPU Gems 3. Ive found this paper:

http://www.shaderwrangler.com/publications/sat/SAT_EG2005.pdf

Which tells you how to create summed area maps for use in various effects, using the GPU. The algorithm seems straight forward but I cant seem to get it right:


-(void) generateSummedTables:(QCOpenGLContext *)context withTex:(GLuint) tex {
	
	CGLContextObj cgl_ctx = [context CGLContextObj];
	// Horizontal Scan

	
	int nm = ceil(log2(mFBOSize));
	glUseProgramObjectARB([mSummedTableShader programObject]);
	glUniform1iARB([mSummedTableShader getUniformLocation:"texWidth"],mFBOSize);
	
	GLuint atex = tex;
	[mSummedFBO bindNoDraw];
	
	glMatrixMode(GL_PROJECTION);
	glPushMatrix();
	glLoadIdentity();
	glOrtho(-1, 1, -1, 1, 0.0, 10.0);
	glMatrixMode(GL_MODELVIEW);
	glPushMatrix();
	glLoadIdentity();
	glColor3f(1.0,1.0,1.0f);

	unsigned int ni = 1;
	BOOL usingA = TRUE;
	for(int i=0; i < nm; i++){

		//glClear(GL_COLOR_BUFFER_BIT);

		if (usingA) {
			glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);
		} else {
			glDrawBuffer(GL_COLOR_ATTACHMENT1_EXT);
		}

		
		// Start off with an input texture A
		glUniform1iARB([mSummedTableShader getUniformLocation:"Ni"],ni);
		glUniform1iARB([mSummedTableShader getUniformLocation:"texture"],atex);
		
		glBindTexture(GL_TEXTURE_2D, atex);
		
		glBegin(GL_QUADS);
		glTexCoord2f(0.0, 0.0);	glVertex3f(-1.0, -1.0, 0.0);
		glTexCoord2f(1.0, 0.0);	glVertex3f(1.0, -1.0, 0.0);
		glTexCoord2f(1.0, 1.0);	glVertex3f(1.0, 1.0, 0.0);
		glTexCoord2f(0.0, 1.0);	glVertex3f(-1.0, 1.0, 0.0);
		glEnd();
		
		ni = ni << 1; // Move up
		
	
		if (usingA) {
			atex = [mSummedFBO getTextureAtTarget:0];
		} else {
			atex = [mSummedFBO getTextureAtTarget:1];
		}
		
		usingA = !usingA;
	
	}
	
	// Vertical Scan
	
	usingA = FALSE; // Sure about that? 
	ni = 1;
	atex = [mSummedFBO getTextureAtTarget:1];
	glUseProgramObjectARB([mSummedTableVShader programObject]);
	glUniform1iARB([mSummedTableVShader getUniformLocation:"texWidth"],mFBOSize);
	
	for(int i=0; i < nm; i++){
		
		//glClear(GL_COLOR_BUFFER_BIT);
		
		if (usingA) {
			glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);
		} else {
			glDrawBuffer(GL_COLOR_ATTACHMENT1_EXT);
		}
		
		
		// Start off with an input texture A
		glUniform1iARB([mSummedTableVShader getUniformLocation:"Ni"],ni);
		glUniform1iARB([mSummedTableVShader getUniformLocation:"texture"],atex);
		
		glBindTexture(GL_TEXTURE_2D, atex);
		
		glBegin(GL_QUADS);
		glTexCoord2f(0.0, 0.0);	glVertex3f(-1.0, -1.0, 0.0);
		glTexCoord2f(1.0, 0.0);	glVertex3f(1.0, -1.0, 0.0);
		glTexCoord2f(1.0, 1.0);	glVertex3f(1.0, 1.0, 0.0);
		glTexCoord2f(0.0, 1.0);	glVertex3f(-1.0, 1.0, 0.0);
		glEnd();
		
		
		ni = ni << 1; // Move up
		
		
		if (usingA) {
			atex = [mSummedFBO getTextureAtTarget:0];
		} else {
			atex = [mSummedFBO getTextureAtTarget:1];
		}
		
		usingA = !usingA;
	}
	
	glPopMatrix();
	glMatrixMode(GL_PROJECTION);
	glPopMatrix();
	
	[mSummedFBO unbindFBO];

	glUseProgramObjectARB(NULL);
}


AND THE FRAGMENT SHADER FOR GENERATING TABLES (Vertical and Horizontal)

uniform int texWidth; // Should have size I imagine
uniform int Ni;	// texels along 2 ^ i (so we pre power)

// TODO - this is practically identical to horiz therefore we should just have a swap variable or something

uniform sampler2D texture;

void main (void) {
	// vertical Pass 
	vec2 s = gl_TexCoord[0].st;
	vec2 sd = s;
	sd.y = sd.y + ( 1.0/ float(texWidth) * float(Ni) );
	vec4 c = texture2D(texture, s) + texture2D(texture, sd);
	
	gl_FragColor = c;
	
	// Now we SWAP textures

}



uniform int texWidth;
uniform int Ni;	// texels along 2 ^ i (so we pre power)

uniform sampler2D texture;

void main (void) {
	// Horizontal Pass
	vec2 s = gl_TexCoord[0].st;
	vec2 sd = s;
	sd.x = sd.x + (1.0/ float(texWidth) * float(Ni));
	vec4 c = texture2D(texture, s) + texture2D(texture, sd);
	
	gl_FragColor = c;
	
	// Now we SWAP textures

}

So what I have is a ping/pong fbo with two textures, both 512 x 512 power of two, square textures. They are GL_RGB32F_ARB textures represented with GL_FLOAT.

I go through horizontally summing and then vertically summing. Each texture has a clamp to border flag set with a colour of 0 to make sure that any overruns dont affect the summing result.

How would I check this is correct? I suppose the easiest way is to write a shader that converts back to the previous result probably?

Ok so I found one error. I was passing the texture ID to the shader and NOT the texture unit. Classic Schoolboy error.

I implemented a shader that would reverse the summed tables to see if what I was getting was correct. The results are rather odd! :S

Im guessing there has been some precision loss or something going on here. My algorithm simpy takes the sum at that point and the texels immediately to the left, above and above left and performs the standard summed lookup but omits the divide. This should return the original pixel and it certainly seems to be in most cases.

Sorry guys, fixed it now. There was a step missing from my loop and also, I needed to swap A and B textures one last time.

Hej OniDaito,
I tried to implement this algorithm on iOS in OpenGL ES 2.0 and I’m stuck with a problem of precision and data transfer. As I understand the fragment shader output, it is clamped to a range of [0.0 … 1.0] and 8 bit per component (I think this is a limitation by the hardware manufacturer). To fit the larger integral values into the framebuffer/texture, I might try to compute the summed area for only one color channel. Is it possible to write to multiple framebuffers at the same time (with different results - every channel would use a seperate buffer).
Does anybody have had similar problems or an idea how to solve it?

Is it possible to write to multiple framebuffers at the same time

It is on desktop GL - it’s called multiple render targets (MRT).
Deferred Rendering often uses this technique to output to two or more buffers at the same time from a single fragment shader.
The should be no reason (other than performance) why OpenGL ES 2.0 can’t support this - but I’m not an ES expert I’m afraid.