VBOs and culling?

sammie381 · August 6, 2009, 5:18am

I would like to use VBOs and culling together, however I don’t know the best way to do it. I was thinking about using
frustrum culling because the camera’s view could change every frame. And this is not a terrain (just general polys),
so I can’t store my data in tiles. I have to cull it per triangle.

Or if someone could point me to an coding example that uses VBOs and culling, that would be ok.

zeoverlord · August 6, 2009, 9:00am

well the thing is that your graphics card does that automatically.
I would suggest that you do something called macro culling, it’s basically the same except you do it on objects and not triangles.

the way you do it is pretty simple, essentially you take the bounding box vertics and pass them trough the projection+modelview matrix, then remove anything that is fully less than 0 or larger than 1 on any axis.

sammie381 · August 6, 2009, 12:35pm

Thanks, I know OpenGL does its own culling, but my intention is to cut down on the number of triangles I send to the GPU,
since memory is limited. The difference could be huge.

I suppose I have to create some sort of space partitioning structure of bounding boxes.
The scene is static, only the camera is moving. My only concern is that the VBO will be constantly updated every frame? I suppose that’s ok?

Ilian_Dinev · August 6, 2009, 1:09pm

You won’t be updating the static VBOs every frame!!!
Never do back-face culling on the cpu!

Here’s how modern graphics works: you have a VBO for each mesh (1-6000 tris). You upload those meshes on startup (OpenGL will send them to VRAM). Then, on every frame, you tell the gpu: “use this texture, this shader, and draw this whole mesh”. Notice “whole mesh”. The gpu will transform+cull the mesh so fast, that your cpu-based culling would have computed only 1%-10% of the triangles in the same time, not to mention the PCIe bandwidth you’d waste to send the geometry would be making the game even slower.
Modern gpus can compute and cull 300-700 million triangles/s . On a 3GHz cpu, you can do less than 5% of that.

Of course, it won’t be nice to tell the gpu to draw the whole scene. You’d want to keep some of that processing power invested in the fragment-shaders. Thus, segment your scene into medium-sized meshes, of 5000-60000 triangles, and do frustum-culling on them. Or even better, group objects in octree-like fashion, where you can cull whole collections of such medium-sized meshes with just one frustum-culling test.

sammie381 · August 6, 2009, 1:24pm

Thanks, but as I mentioned, memory is limited, which forces me to do my own culling. And if I add culling, then I don’t see how I can take
advantage of static VBOs. I will need to update them every frame. And the number of triangles is over a million.

dletozeun · August 6, 2009, 4:28pm

What you need is not backface culling, as others said it is already performed by the hardware so don’t waste your time doing it yourself. What you need is frustum culling and it is what zeoverlord said. I think you will easily find some papers on “frustum culling”, “occlusion culling” or “contribution culling”.

sammie381 · August 6, 2009, 5:26pm

What you need is not backface culling, as others said it is already performed by the hardware so don’t waste your time doing it yourself. What you need is frustum culling and it is what zeoverlord said. I think you will easily find some papers on “frustum culling”, “occlusion culling” or “contribution culling”.
[/QUOTE]

Read my first post, I never mentioned it.

Ilian_Dinev · August 6, 2009, 5:39pm

If VRAM is the problem, then fret not - it’s OpenGL’s task to dynamically stream those meshes on demand. SysRAM always keeps a copy of each static VBO.
If you don’t trust the GL implementation to nicely manage the vtx-data, you can use a streaming-VBO to do fire-and-forget streaming.
Doing the backface culling on cpu is generally silly, even if you’re short on VRAM and RAM, and even if the gpu is rather old. (unless you work at InsomniacGames/NaughtyDog)

You can always partition static geometry nicely, to make those triangle-groups of 1k-60k tris, and frustum-cull whole groups.

A cpu simply cannot process 1+ million tris every frame at 60fps, so forget backface-culling and let the GPU pull data via DMA.

P.S. we’re mentioning per-triangle backface-culling on cpu as being bad enough, because frustum-culling each and every triangle on cpu is even worse (I thought it’s obvious).

_NK47 · August 7, 2009, 2:15am

if memory is an issue you can use bounding spheres consisting of 4 floats (position, radius) rather then OBB to perform frustum culling. while they are processed faster they generally lack good fit for meshes. additional thing is a scene graph to mimic relationship between objects where a parent node contains a whole bounding volume for its child nodes. if parent node isn’t seen you can cull the whole tree at once.

Note that every graphics programmer actually deals with 2 processors at a time namely CPU and GPU. try to avoid doing things on the CPU what the GPU does a whole lot faster.

sammie381 · August 7, 2009, 12:27pm

Thanks, I think I’m gonna have many problems with this one. I can’t load the whole scene in system memory or GPU memory.

Really need a lightweight loader, renderer, everything.

zeoverlord · August 9, 2009, 3:19am

in that case you have to look into dynamic streaming of your data and various lod techniques, maybe you could look into if you could reuse some of the data.

The VRAM can hold a pretty impressive amount of data so if your running out of space it must be huge.

sammie381 · August 10, 2009, 1:29pm

Well, I think most of my memory problems comes from the fact I have to do some pre-processing before I can load the data
into GPU memory. The data on file is not in a GPU-friendly format. It’s just hard to avoid unnecessary memory allocations
when you’re tring to re-arrange and sort the data into something more manageable.

For example, with 1GB of GPU memory, I think I can get about 10 million triangles into memory no problem.
With 3 vertices/normals/uvs per triangle, that’s like 30 million verts/normals/uvs. 12 bytes per vert/normal,
8 bytes per uv, comes out like 3012 + 3012 + 30*8 = 960 mil bytes.