Quadtree performance question

Hello,

I am working on an 3d engine to display rather large terrains (at least 1000 x 2000 km), and the map ist segmented in 100 x 100 segments. To optimize performance and to reduce the number of segments that have to be tested for visibility, i use a quad tree where every segments becomes a “final” node of the tree.

But strangely, performance with using the quad tree is WORSE than before (where every single segment was tested for visibility)., With the drawing itself disabled, the tree version “renders” about 430 fps, while the other version without tree reaches about 560.

However, with drawing enabled, the version without tree is slightly better (26 vs 24 fps), but it does not seem to make a real difference at all. as does turning on/off compilier optimization, so my guess is that the REALLY time-consuming thing is the drawing (the map consists of more than 5 millions of triangles), making the clipping algorithm rahter seem less important.

But I still wonder why the qtree version is slower, althoug less than half as many times the visibility testing function gets called… is it possible that the recursion to descend the tree is far more expensive than the visibility test, althoug that has a sqrt() in it?

Regards,
Jan

how many segments do your terrain have and
how many polys are included in one segment?

rapso->greets();

100 x 100 segments = 10000 segments

5 mio triangles = 500 tri/seg

Jan->greet();

I suggest using a lower level of detail for blocks that are further away from the camera. Perhaps even merge blocks that are further away into larger blocks, a la Geo-Mipmapping.

Hierarchical culling (not just quadtrees) is only effective if the additional checks end up saving more checks lower in the hierarchy than they add at the top of the hierarchy. I’d recommend putting counters in for the number of checks done each way. If the numbers don’t make sense, make sure that you are doing everything right… ie, that you are correctly culling hierarchical branches, etc.

LOD is generally a great thing but at the moment there ist no time to implement it (or rather, for creating data with several LOD levels). Maybe later…

And i have added counters for the number of segments/tree nodes teste for culling each way, and it’s always 10.000 (wonder g) for the static version, and about 3000-6000 (depending on the viewpoint) with the tree version.

I found out that the tree finally gets faster than the other version when the visible area gets rather small (slightly faster with a visible range of 100 km). It seems that the recursive function calls are more expensive that just simply checking some more (or lot more) segments.

Anybody experiences with this? Maybe it is a good idea to turn the code recursion into data recursion (urgh)? What IS the best wasy to manage large terrain data?

Regards,
Jan

With those numbers of vertices, you will very likely be geometry bound. Saying “there is no time to implement LOD” and then spending time trying to get view culling to work seems like the wrong priority to me.

If you can get LOD to work right, it’ll probably be quicker to implement and run faster than getting view culling to work right, because with LOD, not culling something far away puts almost no extra load on the geometry engine. (Note: it’s better if you do both :slight_smile:

There are many copy-and-paste algorithms available on the net to take existing data and run vertex reduction on them.

The problems with LOD are:

  • I am only working on the 3D engine, not on the data, and the people who would prepare the LOD level data are VERY busy at the moment (this does not hava anything to do with OpenGL, I know…)

  • I am concerned about the number of display lists and the amount of data that have to be in the graphics card memory… with 100 x 100 segments, there are 10.000 display lists, and with an additional LOD level, there obviously are 20.000, and the lower LOD level data is of course going to be smaller, but not to be really SMALL, and with too much display lists and corresponding data, apps rahter tend to be slower than faster, compared to rendering “directly” without display lists. vertex array make things even wors, at least this is my experience.

Jan

About that second point, it’s not as bad as you think. If you do the lod in a typical way where a lower lod level stores four times as much terrain as a higher one, you end up with 33% more display lists and memory consumption total. Just like mipmapping, you know. Well that doesn’t account for hole filling or smooth transistions, which can make it worse. Still, using LOD, especially for terrains, is very likely to make your app a lot faster. Whether it’s the time to add it in your project, that I obviously don’t know.

-Ilkka

BTW, dont use display lists for this… (use them rarely)

The terrain people give you 3D data, right?

It’s up to the ENGINE (or possibly the tool chain) to implement LOD. I don’t see how the terrain-data people would need to be involved at all.

A typical API to terrain rendering would look something like:

Start traversing from camera, in direction of view vector. Traverse in a “fixed” terrain cell grid.
For each terrain cell you encounter, check whether there is a pre-prepared display list with an appropriate LOD. If it is, draw it.
Else, call into the terrain engine and ask for data for this grid cell, and generate a display list at this point (possibly deleting another display list that hasn’t been used for a long time, and is high detail far away).

Note that your generation of display list from terrain data could auto-generate the appropriate LOD data through, for example, filtering, or feature-preserving mesh reduction.

Look at http://www.vterrain.org/ for more information.

well, a lot of stuff .

  • a lod level means 100 % more display lists (one for each segment). there is no on-the-fly-segmentation or something like that, as there is no on-the-fly-display-list-compiling. there are 100 x 100 static segments, that’s all. considering the time schedule, it is impossible to prepare the data in a way that allows on-the-fly-creating of lod’d display lists. I have GREAT g ideas about this but there is no time to implement them, and i’m quite sure that noone will pay, also .

  • why not/rarely use display lists? rendering without them is dramatically slower, much too slow for the app, and vertex arrays are even worse (in my experience, and NIVIDIA says the same ).

  • the people who give me the 3D data are working on the “tool chain”, guess that was a misunderstanding. Creating LOD levels on startup of the app is certainly much too slow to be accepted from the client, so it has to be pre-prepared. With LOD there will be 100 x 100 segments, each in two flavours (high and low LOD), and that’s it. And we will add this later .

  • is it possible to genereate grid cell display list data on the fly wihtout significant performance drawbacks? Can’t imagine (what maybe means that i should rather have posted this whole thread in the beginner’s forum).

Regards
Jan

We generate display lists for all our static geometry on the fly, as it gets paged into memory (and delete them when it gets purged out). Unfortunately, on later nVIDIA drivers, we’re having problems with hard-but-not-impossible-to-reproduce crashes inside glDeleteLists() so we have to turn it off for certain driver versions.

However, we treat terrain more like animated meshes, and run it through AGP memory, using NV_vertex_array_range (and soon ARB_vertex_buffer_object).

If you create one or a few display lists each frame, it shouldn’t impact your frame rate much at all. Because you can’t compile display lists to disk, you have to get a height field or a geometry mesh from disk at some point. That would be the ideal time to generate LOD, I think – if you compile a display list, the CPU has to touch the entire block anyway, so it’ll be in L2 cache. The extra time hit would be very small.

Anyway, if you get sufficient frame rate without LOD, then no need to work on it. I was just saying that, to get the frame rate up, if it’s NOT sufficient, LOD would be the obvious solution.

the frame rate is in fact sufficient, but not “great”. about 30-100 fps on my machine (athlon xp 2000+/gf4 4200). If creating LOD on startup will not cost a lot of time, it is surely the best way to do it then… but i can’t see that using vertex arrays will be relly fast for rendering large amounts of triangles? I tried it and it is MUCH slower than using display lists. I once had thousens of small “grass” quads rendered (the engine supports that as well, althoug it is not really used yet), and reworked it all to use a vertex array instead of immediate rendering, and it got really slower… same with display lists (so my guess, simply to many display lists in the graphics board memory).

Regards
Jan

In general, vertex arrays should be faster, although you’ll need to use glDrawRangeElements() (nVIDIA claims they re-pack the array and do the equivalent of DrawArrays if you use glDrawElements() without using VertexArrayRange()).

Also, on nVIDIA hardware, establishing and using a VertexArrayRange() makes things MUCH faster, in general.

I recommend you run a profiler such as VTune to figure out where the time is being spent. If you find you spend all your time in some copy-loop in the driver or OpenGL implementation, it’s likely that you’re not tickling the driver in the expected/correct way, and you should find out how the driver wants its data.

If you find that you’re spending all your time in your own program, then you know where to fix it.

If you find that you spend all your time in HAL somewhere, you’re probably fill or transfer/transform bound. Try turning on glCullFace(GL_FRONT_AND_BACK) to determine whether it’s fill or transfer/transform.

What do you mean with vertex array being faster, faster than immediate OpenGL callls oder faster than display lists?

I think I will try them, not for the ground but for other parts, trees and grass, where a large amount of quads have to be rendered… i tried using display lists there, but the app got rather slower (so my guess is that there were really too much display lists). As with using alpha test, they do not have to be depth-sorted, so I think that once sorting them into an array might really be faster. Strange that I did not have this idea yet .

Jan

I have had some strange results with DisplayLists vs Vertex Arryas while rendering a Quadtree-culled terrain heightmap.
Display Lists yield faster ~200fps when rendering a few tiles. But when rendering a large portion of the map they drop to ~40fps.
Vertex Arrays are slower with a few tiles ~130fps. But they render the full map at ~60fps.

hmm. so i guess i can be happy that the app has to render only a small part of the terrain for each frame .

How fine was the segmentation for your tree? each triangle a node? or a static segmentation that gets pushed into the tree structure with each segment becoming a node?

Each leaf contained a 9x9 vertex mesh which had 116 triangles. I haven’t implemented any LOD yet - and the triangles were sent as GL_TRIANGLES in a single glDrawElements call - no strips or fans.

interesting approach… how large ist your terrain at all, and did you find it hard to implement the quad tree? do you think it’s useful?