Compute shader: instance culling

hi,

i’m trying to figure out how i can efficiently determine with a compute shader if a mesh is on-screen. first idea was to check if any bbox corner vertex is on-screen, and if not, then the box is invisible.

unfortunately, thats not correct. if i zoom near to the mesh / bbox so that the corners are out of screen, the mesh is culled away.

note that these bboxes are NOT axis-aligned.

#version 460 core

layout (local_size_x = 1024,  local_size_y = 1, local_size_z = 1) in;


/* argument type buffered in GL_DRAW_INDIRECT_BUFFER */
struct DrawElementsIndirectCommand
{
	uint Count;
	uint InstanceCount;
	uint FirstIndex;
	uint BaseVertex;
	uint BaseInstance;
};


layout (location = 0) uniform mat4 Projection = mat4(1);

layout (std430, binding = 1) buffer MV_BBox_Block {
	mat4 mv_bbox[]; // the instance buffer containing modelview matrices
};

layout (std430, binding = 2) buffer CMD_Block {
	DrawElementsIndirectCommand cmd[]; // the indirect buffer containing draw call parameters
};


const vec3 BBox[8] = {
	vec3(-1, -1, -1),
	vec3(+1, -1, -1),
	vec3(+1, +1, -1),
	vec3(-1, +1, -1),
	vec3(-1, -1, +1),
	vec3(+1, -1, +1),
	vec3(+1, +1, +1),
	vec3(-1, +1, +1)
};


void main()
{
	uint index = gl_GlobalInvocationID.x;
	if (index >= mv_bbox.length())
		return;
	if (index >= cmd.length())
		return;

	// set instance count to 0
	cmd[index].InstanceCount = 0;

	for (int i = 0; i < 8; i++)
	{
		vec4 screen_position = Projection * mv_bbox[index] * vec4(BBox[i], 1);
		screen_position.x = screen_position.x / screen_position.w;
		screen_position.y = screen_position.y / screen_position.w;
		screen_position.z = screen_position.z / screen_position.w;

		if (screen_position.x < -1 || +1 < screen_position.x)
			continue;
		if (screen_position.y < -1 || +1 < screen_position.y)
			continue;
		if (screen_position.z < -1 || +1 < screen_position.z)
			continue;

		// vertex in on screen: set instance count to 1
		cmd[index].InstanceCount = 1;
		return;
	}
}

thank for any advice!

If all vertices are outside the same edge (top, bottom, left, right), then it’s outside the view frustum. But the converse isn’t true; if that condition doesn’t hold, it may intersect the view frustum or it may not.

You can also try testing whether the edges of the view frustum (lines through the corners of the viewport) intersect the bounding volume.

If you need a definite answer, one approach is to clip the bounding volume to each edge in turn. If anything remains, that portion is inside the view frustum, otherwise there’s no intersection.

A less computationally-intensive approach is to choose a plane; if all vertices of the bounding volume are on one side and all vertices of the view frustum are on the other, then there’s no intersection. Again, the converse isn’t true; if the condition doesn’t hold, the bounding volume may or may not intersect the view frustum. The difficulty lies in choosing an appropriate plane. One approach is to choose a plane containing the closest edge of the view frustum, oriented perpendicular to a line between the object’s centre and the line through the centre of the viewport. That plane is guaranteed not to intersect the view frustum, so it’s a matter of determining whether the bounding volume is on the other side. If some of the bounding volume’s vertices intersect that plane, choose another plane formed from the edge of the view frustum and whichever vertex was farthest beyond the plane.

Ultimately it’s a question of how much time you spend on culling versus how much time you spend drawing meshes which aren’t actually visible.

Consider using a sphere as a bounding primitive, and cull test with it instead (or at least first).

One reason for this is that it’s typically much cheaper. 1 dot product + 1 comparison per clip plane. Very straightforward. For the typical ortho or perspective frustum, 6 plane tests and you’re done. You can even early-out if you find the sphere is completely outside one clip plane. Less of an advantage on the GPU if your instance list is clustered spatially.

This will trivially reject most 100% out-of-frustum objects (or trivially accept many 100% in-frustum objects).

For those cases where the sphere straddles the frustum, you can just treat the instances as in frustum or fall back on a more expensive culling technique such as OBB or AABB to try harder to reject them. But bench this to make sure it’s even worth doing.

hmm, sounds interesting … how would you do that?

i would try it like this:

visible = false;

for (each 6 view frustum planes)
{
	intersection_circle = plane & boundingsphere
	if (intersection_circle)
	{
		if (intersection_circle.center in intervall [-1, +1] x [-1, +1]) // xy, xy or yz, depends on plane
		{
			visible = true;
			return;
		}
		
		corner = find nearest view frustum corner
		if (distance(intersection_circle.center & corner) < intersection_circle.radus)
		{
			visible = true;
			return;
		}
	}
}

EDIT:

another thing i thought about is: is the bounding sphere kind of “deformed” when transforming with the typical “Projection * View * Model” matrix? can i just transform the bsphere center, and treat the frustum an unit cube [-1, +1] x [-1, +1] x [-1, +1], and do the checks with the raw bsphere radius?

You need to perform the test in eye space, where the view frustum is a … frustum (truncated pyramid).

Testing whether a sphere is outside a plane boils down to testing whether the distance of the sphere’s centre from the plane is greater than the sphere’s radius. The distance is just the dot product of the (normalised) plane normal with the centre position (the planes bounding the view frustum are guaranteed to pass through the eye-space origin, so there’s no offset).

The (unnormalised) normals for the left, right, top and bottom planes are (-A,0,1), (A,0,1), (-B,0,1), (B,0,1) where A and B are the [0][0] and [1][1] elements (respectively) of the projection matrix.

I’m assuming you don’t need to test against the near and far planes, as the near and far distances are usually set so as not to exclude anything. Anything behind the viewer will be excluded by either the left or right plane and also by either the top or bottom plane.

Well, you can also do it in world space, but then you need to transform the planes by the inverse view matrix. Well, actually by the transpose of the view matrix; normals need to be transformed by the inverse-transpose of the inverse view matrix, so the transpose of the view matrix. The w component should be zero (as they pass through the eye-space origin).

But you can’t reasonably do it in clip space or NDC because the sphere is no longer a sphere there.

GClements has already given you the jist of it. Here’s a URL with some simple source code if you need it:

Be sure to check out the main page, for other testing using other primitive types and techniques:

thank you all. i thought about it and i’ve read the sites linked above. i think another way is to use bounding spheres transformed into view space. then, convert the cartesian coordinates (x, y, z) into spherical coordinates (r, phi, theta). then:

– cull everything behind the near plane and beyond the far plans
– check the arc distance (if bigger than fov [in both directions] + radius, then cull)

everything else will be drawn. (for example spheres that “contain” the frustum)

finally i’ve got it !! :wink:

#version 460 core

layout (local_size_x = 1024,  local_size_y = 1, local_size_z = 1) in;


/* argument type buffered in GL_DRAW_INDIRECT_BUFFER */
struct DrawElementsIndirectCommand
{
	uint Count;
	uint InstanceCount;
	uint FirstIndex;
	uint BaseVertex;
	uint BaseInstance;
};


layout (std430, binding = 1) buffer MV_BSphere_Block {
	vec4 bspheres[];
};

layout (std430, binding = 2) buffer CMD_Block {
	DrawElementsIndirectCommand cmd[]; // the indirect buffer containing draw call parameters
};


layout (location = 0) uniform float FieldOfView = 0.0f;
layout (location = 1) uniform float AspectRatio = 0.0f;
layout (location = 2) uniform float ZNear = 0.0f;
layout (location = 3) uniform float ZFar = 0.0f;


void main()
{
	uint index = gl_GlobalInvocationID.x;
	if (index >= bspheres.length())
		return;
	if (index >= cmd.length())
		return;

	// invisible by default
	cmd[index].InstanceCount = 0;

	// bsphere position (in view space)
	vec3 position = bspheres[index].xyz;
	float radius = bspheres[index].w;

	// behind ZNear
	if ((position.z - radius ) > ZNear)
		return;

	// beyond ZFar
	if ((position.z + radius ) < ZFar)
		return;

	// bsphere is within the z-range of frustum ...

	float x0 = -position.z * tan(FieldOfView * AspectRatio / 2);
	float y0 = -position.z * tan(FieldOfView / 2);

	// upper/lower plane
	float dy = abs(position.y) - y0;
	float d1 = dy / cos(FieldOfView / 2);
	if (d1 > radius)
		return;

	// bsphere is within the z-range and y-range of frustum ...

	// left/right plane
	float dx = abs(position.x) - x0;
	float d2 = dx / cos(FieldOfView * AspectRatio / 2);
	if (d2 > radius)
		return;

	// visible
	cmd[index].InstanceCount = 1;
}

and the cpp part:

	glUseProgram(Program_Model_InstanceCulling);

	glUniform1f(0, camera.FieldOfView);
	glUniform1f(1, aspectratio);
	glUniform1f(2, -camera.ZNear);
	glUniform1f(3, -camera.ZFar);

	glDispatchCompute(1 + drawcount / 1024, 1, 1);

	//glUseProgram(0);

	glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);

the frustum points to the negative z direction, the bounding spheres are transformed into view-space. there are “drawcount” elements in each SSBO.

thanks again for the links above, visualizing the problem makes it mch easier than thinking a hole in your brain ^^

Great!

Be aware that glMemoryBarrier() syncs on the consumer of the data, not the producer. So if you want to feed this to a gl*Draw*Indirect() call, then you probably want GL_COMMAND_BARRIER_BIT here.

ok, thank you, i’ve just leaned another thing :wink: i thought i was the other way around …

assuming that the negative Z-direction is the “viewing direction”, the top/bottom plane normals shouldnt have an X-component, the left/right plane normals shouldnt have an Y-component …

i assume you mean then: :thinking:
(-A,0,1), (A,0,1), (0,-B,1), (0,B,1)

Yes, that’s correct.

(And now I have to pad this out to 30 characters).