unique ROAM VBO issues and a clincher

Originally posted by michagl:

i worry though about the visual effects which might occur from not adhereing to vsync… honestly i just mostly use it as a cheap built in performance limiter, and have never really thought of what visual artifacts might occur from being out of sync with the monitor. care to explain anyone?

The visual problem with running without vsync is that you end up showing parts of 2 frames (or more for particularly high FPS) during one monitor frame. You end up with the top part of the screen showing the old frame, and the new frame showing on the bottom (or stripes if you render more than 2 frames). Its more noticeable when moving quickly with high contrast scenes, or if you manage to get a consistant enough rendering rate that the transition is stable relatve to the screen.

As far as your algorithm, I don’t see enough detail to really tell what you are doing either (or I might be just getting confused by your terminology, not sure…).

Questions I had from what I did get tho:
You said ‘equilateral sierpinski tesselation’, but the screenshot you posted looked more like right triangles than equilateral triangles. If you meant right triangles, what benefits do you get over quads (aside from the 8bit indices)? I looked at a similar tesselation at one point, but the way I was doing it, it looked to be just a messier version of my quadtree code.

When you generate triangle strips, is this for an entire chunk of 64 tris? or is it some sort of partially tesselated subset of those 64? or something else entirely?

Do you support features like caves and arches?

When you talk about the 2 tiers in your mesh, is tier 2 all the geometry that is actually rendered, or is there geometry in tier 1 also?

Originally posted by knackered:
You’re actually saying nothing about your own algorithm. You’re just pompously rambling on to yourself about essentially nothing, self editing your own text using some kind of third persona. Who are you really talking to?
i will pay no mind to this bit…


You’ve read hoppes paper? Good, you found nothing interesting in it? Because it concerned itself specifically with planar regular grids? That’s because it’s an optimisation technique for planar regular grids, it wasn’t intended to be another progressive mesh technique. It’s intended for extremely large planar datasets, and it’s the most optimal technique I’ve come across so far for these datasets.

i don’t doubt it… it just isn’t very robust. i assumed he was doing a small tract of land in washington, because i recalled his was a planar algorithm, and the idea of mapping the entire continental US to a plane is just rediculous on the face for me.

Your technique is for general meshes then?
yes, ‘non-planar geometry’ as stated… i prefer ‘anisotropic manifolds’, but i was trying to keep the terminology simple. if you you would read, and think a little bit, then you could save your self moaning and groaning. if i spare you this much, i hope you know something considerable about VBOs internals, you will be willing to trade me.

Yet you say you think it’s up to nvidia to provide you with geometry optimisation tools…
if they want to sell hardware, yes… i certaintly would provide tools were i in there situation.

…which suggests you precompute your tesselations offline, which suggests it’s limited to relatively small datasets…either that or there’s a hell of a lot of cpu work going on at runtime.
yes, the tesselations are computed offline, as i’ve said so plenty of times. and no the datasets are infinite in scale… the tesselations are combined to form every possible configuration. there are around 200k 8x8 tesselations, which saturate the array of states assumable by what i think is conventionally called a ‘right triangle subdivision’… split a triangle at the middle of its base, to its apex, then the two formed triangles assume the split line as their bases…

EDIT: this is wrong… the resulting triangles assume the midpoint of the base as their apex.

repeat until the their are 8 edges on either side of the triangle… there are around 200k possible states resulting from this process. a triangle strip is computed offline for each state, which can be easilly retrieved at run-time, to provide an automaticly ‘theoreticly’ perfect ‘piece-wise’ stripping.

I have actually read what you’ve said now, and I’m none the wiser, all I see is gerbils playing chess inside “equilateral sierpinski’s”.
then i’m probably wasting my time. i’m reminded of the parable that says something ironic about humans having two ears and but one mouth…

For me, try to explain your algorithm in no more than, say, 10 sentences using words of no more than, say, two syllables (avoiding all gerbal analogies, no matter how tempting).
Or, if you prefer, dismiss me with a patronising wave of your keyboard.

you typed a lot, so i will take you at your word. as for myself, i specialize in irreducible complex alorithms… there is a whole world of algorithms which can not be explained in “10” steps or less from the top down… and they tend not to make it into acedemic papers, if only because acedemic professors were not trained to think beyond 10 steps.

but anyhow, you are extremely hostile and insensible for some reason, but i will patronize you… just keep in mind that i owe you nothing, and i have nothing to proove. its not every day that i get to work with a system that i can be so public about, so presenting my work is admittedly probably not a strong suit.

glossing over all caveats and technicalities:

0)preprocess template multi-resolution connectivity, and compute a triangle strip for every possible permutation of a sudivided triangle to a given depth.

1)at run-time dynamicly recursively subdivide projective or parameterized geometry from the perspective of the view frustum.

2)each leaf node generated by step 1 is replaced with an instance of the template mesh generated in step 0.

3)per-vertex data is computed from streamed image data according to the subdivided texture coordinates from step 1 and uploaded into video memory.

4)according to schedule and frustum activity further subdivide each second level mesh with respect to per-vertex weights and frustum.

5)solve 128bit state for each leaf node’s mesh and use it to look up the preprocessed triangle stripping for that state.

6)render visible leaf node meshes according to per-vertex data and assigned stripping.

Originally posted by 3B:
The visual problem with running without vsync is that you end up showing parts of 2 frames (or more for particularly high FPS) during one monitor frame. You end up with the top part of the screen showing the old frame, and the new frame showing on the bottom (or stripes if you render more than 2 frames). Its more noticeable when moving quickly with high contrast scenes, or if you manage to get a consistant enough rendering rate that the transition is stable relatve to the screen.
i was afraid of that, though i’ve actually never noticed any visual artifacts with vsync disabled.

do lcd monitors and such have the same characteristics for full screen apps?


As far as your algorithm, I don’t see enough detail to really tell what you are doing either (or I might be just getting confused by your terminology, not sure…).

that is understandable… i would’ve like to have had more space and time to explain matters… i think it is all there at least superficially, but it might take a little bit of extra thought to piece together.


Questions I had from what I did get tho:
You said ‘equilateral sierpinski tesselation’, but the screenshot you posted looked more like right triangles than equilateral triangles.

there are both actually if that wasn’t clear… the system is two tier. the first subdivision layer is equilateral sierpinski, which maps well to non-planar geometry, but has a side effect of creating very noticible furrows, especially once the geometry reaches a point where it can be considered planar. the right triangle tesselation takes the furrows out.

however there are as well deeper reasons why sierpinski is better fit for top level, and right triangle better for the second level… but those explanations could become quite complex and wordy.

but just to clear things up, your eyes do not decieve you, they are both their. the sierpinski tesselation is dynamic, were as the right triangle tesselation is static. the sierpinski is allowed to merge and recurse down to an infinite depth, but the right triangle has a fixed depth, and is not managed dynamicly… which is to say, when you subdivide the second level, you simply reset it before sudividing, and merging is not desirable.

If you meant right triangles, what benefits do you get over quads (aside from the 8bit indices)? I looked at a similar tesselation at one point, but the way I was doing it, it looked to be just a messier version of my quadtree code.

a lot of the algorithm is very common place, it is just all brought together in an extremely lucrative way. it takes a big picture perspective to take it all in.

first off, as for 8bit indices, that is just a side effect of the fixed resolution of the second tier system. however the second tier system is not limited to 8x8 for 8bit indices. it is limited because a 4x4 mesh has 68 possible states i believe… an 8x8 mesh has around 200,000… so you begin to get the picture, the next jump is 16x16, and the possible states in that might never be reasonable for even the grandest supercomputer.

its kind of like precomputing all of the possible trees of a chess match… merging them into a single tree, and then always following the shortest route to victory based on each successive move. of course for chess though, the size of that tree would be astronomical, but for an 8x8 right triangle tesselation it is doable.

from this aproach, it is fairly trivial to determine an optimal piece-wise stripping of an ever changing dynamic mesh. were in the past ROAM systems could never top strips over 5 triangles, and had to manicly piece them together in real-time with buckets.

then it just happens the the 127 faced multi-resolution meshes can be encoded quickly into a 128bit key, which can be used to quickly retrieve the stripping of each subregion. for a 16x16 resolution, the key would be 512bits.

so we are pretty much upto the level of static meshes were stripping is concerned. the only real remaining issue where static meshes might win out in terms of raw power (putting scalabity aside), is the fact that a really tight solving of the mesh requires a fair number of distance calculations between the vertices and the frustum… but there are a lot of ways to go about tackling that issue, and the aproaches are not even necesarilly mutually exclusive.

finally as for quads… the system is designed for arbitrary geometry, so a simple answer is fitting a quad into geometry is not always necesarrily going to be possible, or even optimal. it is planned in the future, once the sister system responsible for streaming in transformed triangle regions of maps is fully operation to pair up the first tier triangles whenever possible because it is more lucrative to manage texture images in a rectangular fashion. cases were a pairing is not possible, more rare than not, would mean one half of the rectangular texture buffer would go unused.


When you generate triangle strips, is this for an entire chunk of 64 tris? or is it some sort of partially tesselated subset of those 64? or something else entirely?

the maximum tesselation is 64 triangles currently, though in the future that figure will be likely inflated to 64x3, though this inflation will not change the number of possible tesselation, as the new triangles will be completely self contained. the strips can be of any number of triangles between 1 and 64 depending on the sudivision state of that mesh, which is a function of matters such as topological turbulance (mountains) and the relationship of the view frustum, as well as possibly other features. the right triangle tesselation follows a general rule, which i’m assuming you are familiar with.


Do you support features like caves and arches?

well right now, the system is working with projective type goemetry… think raytracing… or ‘geometry images’, such as you can find in hoppe’s material. fully parameterized geometry such as nurbs models would work well. but the specs of the system are designed with the goal of being able to take opengl type polygonal geometry… so a future api might look a lot like opengl. the system which is responsible for streaming mult-resolution ‘texture’ data is designed with an opengl api in mind.


When you talk about the 2 tiers in your mesh, is tier 2 all the geometry that is actually rendered, or is there geometry in tier 1 also?

yes, that is correct, only tier 2 is rendered technicly speaking. if you look in the image i’ve referenced. the thick lines are the first tier… if you examine them, you are likely to notice a sierpinski pattern… tier two is the highlighted node with edges highlighted with thinner lines.

its tricky to visualize right now perhaps. but i’m planning to essentially add a third tier. to see it, imagine placing vertices in the center of each triangle in the highlighted second tier mesh. then each triangle is split at its vertices, forming three triangles with their apex being the central vertex… these are traditionally called ‘voronoii’ polygons or something, or just fans really in opengl terminology. anyhow, that results in slightly accute triangles, which is really not a good thing visually i imagine. so the final step is before offline preprocessing the strips, to reverse the edges, which yields triangles with much better cover properties. the end result is at least now, each distance test yields three triangles rather than one… and the overall resolution goes up considerably without sacrificing the great subdivision qualities of the 8x8 mesh. all this comes at zero cost, save for rendering the extra triangles, which is really a gain though because it means not needing to render so many meshes.

so i hope this cleared something up… another one of my goals perhaps by posting here, is to build up a significant record to claim ‘prior art’ in case anyone trys to pull a patent out from underneath me.

sincerely,

michael

OK, I think I more or less understand now…

Can you change the topology of the mesh as you tesselate it in tier 1, or do things like holes need to be in the coarsest level? for example: could a plane with small holes be simplified to a plane with no holes?

Are you sure you need to worry about LOD within the tier 2 patches? From what I’ve seen, the current trend is towards doing LOD as coarsely as possible, since the GPU can handle polygons faster than the CPU can handle the LOD (or even the drawing itself).

With 64 tris per batch, you will have a good bit of CPU overhead just sending the draw commands to the card…On my system (2.6G p4, 6800GT), I seem to remember being able to send ~1M batches per second last time I tested, which would give a max of 64Mtri/sec. With larger batches, the GPU can easily* handle twice as many triangles.

*: The main limitation I run into seems to be triangle setup, with a large dependency on how many texcoords the vertex program outputs (ranging from 100M visible tris with no texcoords, to 20M with 4 values in all 8 texcoords). Back facing and culled tris are significantly faster (limited by vertex program length), so for normal views with ~half of the scene backfacing, numbers are a good bit higher.

Oh, I get it now. No, batch size too small, batch count too high. I wasn’t imagining the CPU cyles, you were just not considering that the driver eats cpu cycles.
Back to the drawing board, michagl.

Originally posted by 3B:
[b]OK, I think I more or less understand now…

Can you change the topology of the mesh as you tesselate it in tier 1, or do things like holes need to be in the coarsest level? for example: could a plane with small holes be simplified to a plane with no holes?
[/b]
that is a complicated question at this stage of development… i’m sure there are many non-exclusive strategies which could rectify such a situation. like i’ve said, it is not a cure all system, but the specs are always chosen to leave as much room for robustness in the future. presently as i’ve stated, the system is limited to projective type geometry… presently the best model with a whole it could do is something like a torus. the basic problem is there needs to be some way to derive the curvature of the ever finer model.

it isn’t a ‘precomputed progressive’ mesh type system as i understand them. (basicly as a mesh with a tree type structure) it isn’t that. and it doesn’t commit to a single LOD across the board. so it is best used in cases where the scope of the goemetry is large enough so that only a portion of it might be desirable to detail. so if the entire geometry can be fit inside the view frustum, it is probably best at that point to switch to a different LOD system.

if you are thinking in a video game mind set, try to imagine a vr world, were the detail never weigns no matter how close the camera comes to surface geometry.

the basic philosophy is you don’t want your triangles to be smaller than a pixel, and you want your transitions to be seamless.

Are you sure you need to worry about LOD within the tier 2 patches? From what I’ve seen, the current trend is towards doing LOD as coarsely as possible, since the GPU can handle polygons faster than the CPU can handle the LOD (or even the drawing itself).
yeah sure, if you don’t want to take surface topology into consideration, then i’m sure you could use a limited set of tilable ‘batches’ at any resolution you please. i will probably program that as an option in the future.

With 64 tris per batch, you will have a good bit of CPU overhead just sending the draw commands to the card…On my system (2.6G p4, 6800GT), I seem to remember being able to send ~1M batches per second last time I tested, which would give a max of 64Mtri/sec. With larger batches, the GPU can easily* handle twice as many triangles.
well for what its worth, the final batch size for the approach outline here will probably be a max of 192, which would be three times as many triangles rather than twice as many. to be honest though, i really don’t program for hardware… hardware changes with trends and percieved needs and paradigms, i try to think in terms of as few computational steps as possible. and for what its worth, i imagine this triangle ‘fluffing’ process could probably be applied recursively infinitely, meaning that a single lod tested vertex could drive as many triangles as is desirable… accomodating this would just mean large offline preprocessing times, and would probably only effect run-time positively, as long as the numbers are taylored to the operating hardware.


*: The main limitation I run into seems to be triangle setup, with a large dependency on how many texcoords the vertex program outputs (ranging from 100M visible tris with no texcoords, to 20M with 4 values in all 8 texcoords). Back facing and culled tris are significantly faster (limited by vertex program length), so for normal views with ~half of the scene backfacing, numbers are a good bit higher.

well, there is a system i’m developing which i call MAPS, or the “map system”, or maybe “massive array processing system”. it basicly offers an opengl interface for stream reading and writing to massive on disk images. at the point that is is fairly functional, i will shift the texturing mode of the ‘genesis’ system… which is the system we’ve been discussing. currently vertices are given texture coordinates which reference a shared map(s), but once i complete MAPS triangle drawing system, then each node will get its own personal texture map. at which point at least under many conformal cases the texture and tangent coordinates would be implicit. another way to relieve triangle setup would simply to embed the triangle vertices directly in a float aligned map file. setting the vertices up from there would simply be a matter of streaming them off disk.

i’m not painting any particular aproach as a panacea, this is just a starting point.

more than anything though, i would like to gather any inside incite i can get with respect to the VBO api and other concerns here. mostly optimizing graphics memory management and cpu/gpu parallelism.

its also worth saying that i don’t intend to send anything close to a million batches a second to teh gpu. i also am not designing the system with the intention of maximizing teh gpu across the board. it is meant to be a simulation environment, meaning that it shouldn’t require much more than 10% of the total frame time. that is to say that most objects within a visual scene probably would not require such considerations… unless of course teh camera is intensely focused to the exclusion of all else, upon a roam mesh, in which case some sort of ‘performance throttling’ might be desirable.

Originally posted by michagl:
[b] [quote]Can you change the topology of the mesh as you tesselate it in tier 1, or do things like holes need to be in the coarsest level?

that is a complicated question at this stage of development…
[/b][/QUOTE]Yeah, thats why I was curious if you had a solution for it :slight_smile:

it isn’t a ‘precomputed progressive’ mesh type system as i understand them. (basicly as a mesh with a tree type structure) it isn’t that. and it doesn’t commit to a single LOD across the board.
I’ve seen at least one paper that described variable LOD progressive meshed, seem to recall it being just sort of a hand waving ‘and we could add this’ bit at the end though…


if you are thinking in a video game mind set, try to imagine a vr world, were the detail never weigns no matter how close the camera comes to surface geometry.

That was the situation I was thinking about…with things like tunnels, and arches, that would need to exist in the mesh even if the entire feature was < 1 pixel.

[b]yeah sure, if you don’t want to take surface topology into consideration, then i’m sure you could use a limited set of tilable ‘batches’ at any resolution you please. i will probably program that as an option in the future.

well for what its worth, the final batch size for the approach outline here will probably be a max of 192, which would be three times as many triangles rather than twice as many. to be honest though, i really don’t program for hardware… hardware changes with trends and percieved needs and paradigms, i try to think in terms of as few computational steps as possible. and for what its worth, i imagine this triangle ‘fluffing’ process could probably be applied recursively infinitely, meaning that a single lod tested vertex could drive as many triangles as is desirable… [/b]
The point is, with modern hardware, you don’t need to take local surface topology into consideration… basically, just take x3 idea farther, and draw say a 32x32 patch instead of a 3 triangle fan, at which point you have the equivalent of doing no LOD in tier 2, and you can stop tesselating tier 1 sooner :slight_smile:
It isn’t so much programming for specific hardware, as it is programming for the trend of GPU power increasing faster than CPU power or monitor resolution…

Not saying its definitely better than what you are doing, just curious whether you have tested to see if you really need the extra work to determine triangle level LOD (and the extra GPU ram for the 200k strip index sets). Numbers I would be curious about : how much performance do you lose if you always draw all 64 tris in tier 2? if you stop tier 1 a level higher, and draw 16x16 tris in tier2, or 2 levels in tier 1 and 32x32 tris.

[quote]
The main limitation I run into seems to be triangle setup,

setting the vertices up from there would simply be a matter of streaming them off disk.
[/QUOTE]sorry, wasn’t clear…I meant triangle setup as in the step between vertex program and fragment program on the GPU, not the streaming from disk part.

more than anything though, i would like to gather any inside incite i can get with respect to the VBO api and other concerns here. mostly optimizing graphics memory management and cpu/gpu parallelism.
The biggest answer here is, send bigger batches to the GPU :slight_smile:

Aside from that, if your LOD scheme can handle data not being loaded immediately, you might try streaming data in another thread : map the VBO in main thread (or whichever does all the GL calls), pass the pointer to a loader thread, then signal the 1st thread when loading is done to unmap the buffer and start using the data.

its also worth saying that i don’t intend to send anything close to a million batches a second to teh gpu. i also am not designing the system with the intention of maximizing teh gpu across the board. it is meant to be a simulation environment, meaning that it shouldn’t require much more than 10% of the total frame time.
Even more reason to dump extra calculations. If you are minimizing CPU load and aren’t pushing the GPU, it seems kind of silly to do extra CPU work to cut down on the GPU load :slight_smile:

ok, there is a lot in this last post to respond to, so i’m not going to attempt to detangle all of the quotes. i have two windows open, and i’m just going to dictate between the two, and not miss any points.

as for holes in meshes. the way i imagine it, this system is really designed to kick in once the camera gets close enough to a mesh, that normal geometry like holes, could already be built into the base mesh. its really a system for fully parameterized meshes, like a nurbs model for instance. the idea is to never have visible jagged edges in your geometery.

however it can be used in a projective fashion to build very complex geometry out of a very simple base mesh. a model of earth out of a dipyramid for instance. a simple plane with holes, coming out of a single plane for instance, would be best modeled probably with a raytracing csg type system. that is you would have your plane, and then maybe subtractive cylinders embedded in it.

your right, that it is impossible to derive detail that just isn’t there… that is just a fact of reality. if you wanted to smooth a polygonal model for instance, you would be basicly just left with the task of parameterizing every triangle, by fitting curves to the mesh, essentially turning it into a parameterized patch (nurbs) type mode.

smoothing is not everything though, the system also does surface displacement very well. modeling the shaft of a roman column for instance would be quite simple. you could model the bark of a tree by tesselating the base mesh and displacing it, but progressively smoothing the curvature of the tree would not be possible, unless the tree was fully parameterized (nurbs), or smoothing displacements were actually built into a displacement map – which is really not unreasonable, as i believe Doom3 actually does something like this, to produce low poly count normal mapped models from high poly count models.

the point is, you just can’t get detail were it isn’t unless you just want to make a best guess. the system also would do a lot better with highly regular meshes… your triangles would need to be as close to equalateral as possible for ideal results.

at the end of the day, it is really just better real-time tesselator… however you derive your coordinates is another matter all together.

for instance, i’ve popular modeling systems like maya trying to tesselate nurbs models in real time, and the results are just laughable. i don’t know if there is dedicated hardware or not for this – i do no opengl has a nurbs-type api however.

for what its worth though, i haven’t yet implimented nurbs type support, though i believe i am planning on it, but it is a fairly low priority well after getting everything else solid.
what it does real well at the moment, is building planetary worlds… it could build them on a sphere, inside a cylinder, in a torus, and just about anything like that. it could also be hacked pretty easilly to do convex geometry were a base mesh is available, or do geometry images like in hoppe’s papers. you can also build up some interesting models from typical raytracing primitives… one interface is simply a projective function which takes a point, and a time (t) and returns the proper projection of that point.

its also worth noting that the system handles scale very well. the first tier is done with double precision, the second tier gets by with single precision. another major win, because all of the data points pretty much are in the 2nd tier. the second tier geometry is all normalized and transformed into a local space for optimal precision and easy computations.

I’ve seen at least one paper that described variable LOD progressive meshed, seem to recall it being just sort of a hand waving ‘and we could add this’ bit at the end though…
i’m not sure about the nature of the attempt there, but this system does achieve a variable LOD mesh very well, very elegantly, and extremely effeciently i would argue.


That was the situation I was thinking about…with things like tunnels, and arches, that would need to exist in the mesh even if the entire feature was < 1 pixel.

yeah, there is no way to get around that fact… it is generally a good idea for continuity to have mipmaps as well to avoid non representative sampling. basicly the data has to be available somewhere. to aid in this though, the MAPS system i’m developing facilitate streaming very robustly, and can locally compress and even encrypt on disk data… local compression would definately be a good idea if the data is on optical disk.

in the future though, simulations will just have to share local databases. that is to say, at some point a horse simulation will be so realistic, and the database so large, that there would be no rational reason for every game to have its on local disk space for its own proprietary horse database, when in the end the result is all the same. the same can basicly be said about detailed databases, like maybe the leaf characteristics of every documented tree. corporations will either have to get together on this, or people will build their own cooperative non-proprietary centralized databases, and entertainment corporations won’t be able to keep up… of course complete corporate consolidation (megacorp) is probably the more realistic alternative for the comercial sector.


here is the thing about knocking out tesselation in tier 2. it sounds tempting, but if you do it, you will get star shaped seams in the mesh. personally i prefer a nice smooth sphereical tesselation. if you just saturate every tier 1 triangular region, then you are going to get zones were triangles of higher density stick out. i can provide this as an option, but it isn’t going to be teh center of my thesis. the problem with this kind of approach is that you are always artificially inflating the detail hoping to over saturate the scars in the mesh. and as well, trust me, it is much more noticible when a whole line of triangles all of a sudden change their resolution, rather than doing it gradually one at a time. the eye just picks it out much more easilly. and finally, you have to decide from where are you going to guage the lod test. do you do it on the center of the triangle? or the nearest vertice, and when you start breaking up massive blocks of triangles based on the nearest vertice, you tend to get greater inconsistancies along the borders… that is your subdivision pattern macroscopicly might not be a nice octagon, but maybe something more resembling a buzzsaw.

finally as far as tesselating for topological turbulance, it is really easy enough to do once you are already testing for the frustum. you just scale the frustum test by the topological variance weight of the tested vertex. if there is no variance, then the lod test comes to zero, and no subdivision is done. also adding this kind of noise to the tesselation tends to create less regular tesselation pattern, especially in heavilly topologicly variable regions. the eye notices when say a shock wave of tesselation is emminating from it. and you really don’t want to have to do ‘vertex morphing’. a general rule that is good for avoiding vertex morphing, is to never update the mesh unless the camera is in motion. its virtually impossible to notice slight swimming while the camera itself is swimming. you can get away with lower counts in this way.

as far as pushing the ‘fluffing’ system. first of all just in case it isn’t clear. the 3 triangle fans, would not be rendered as fans, but integrated in the triangle strip for each given permutation. as well basic this aproach would scale as so, 3+9+27+81 … so that at level two each lod tested face would yield 12 faces, at level 3 each face would yield 39 face, and lvl 4 120 faces.

so for a fully tesselated tier 2 mesh, the triangle counts would be:

0:64 - 1:192 - 2:768 - 3:2496 - 4:7680

vertex counts would be like:

0:45 - 1:172 - 2:553 …

might have to do something special along the borders with vertices, but it should be doable.

basicly, though if you go past level 1, then it means using 16bit strip buffers. and i have a feeling, level one would look good enough, and i intend to impliment it soon. deeper levels get more complex, i figure i won’t impliment them for a long time, but probably will eventually. no matter what happens though, its all done offline.


one goal for this kind of system would be to introduce extremely ornate geometry into vr. the true underlying detail would not be revealed until the user investigates very closely for a view, at which point all resources could be focused on the ornate geometry. it is especially well suited for geometry which can not be fit entirely within the view frustum at close range.

[b]
The biggest answer here is, send bigger batches to the GPU :slight_smile:

Aside from that, if your LOD scheme can handle data not being loaded immediately, you might try streaming data in another thread : map the VBO in main thread (or whichever does all the GL calls), pass the pointer to a loader thread, then signal the 1st thread when loading is done to unmap the buffer and start using the data.
[/b]
i’m extremely interested in the ‘mapped’ version of the VBO api. the last time i looked at the VBO specs… the mapped version to me seemed to be more for the convenience of the application programmer than the drivers. did i take away the wrong impression? and can the mapped interface be used to bolster parallelism?

in the end though, you might just be surprised how little it takes to LOD tesselate the static tier 2 meshes. unlike every ROAM system i’ve ever seen, the connectivity data is all completely precomputed. it is really little more than a single distance calculation scaled by constant linear and quadratic attentuation factors, for each face actually split. after that is done, a repair algorithm quickly splits necesarry triangles to keep the mesh legal. then a border mending operation corrects the seams between meshes. everything is extremely light weight.

an optimization with pciexpress might be to let the video card compute the scaled distance factors for every vertex and pass it back to system memory through its read lane. the only loss would be that the distances would be computed for every face even if the deeper faces are never split. but for the gpu calculating distance and a little scaling aught to be trivial, and it could even do 4 vertices at a time, and return their 4 factors in the out vector. the vertices would already be in video memory, and just the camera position would need be uploaded as a ‘uniform’ variable. ( i don’t work with shaders on a regular basis so 'uniform might be the wrong terminology )

It’s not the weight of the lod calculation he’s refering to, it’s the presence of the lod calculation on such small batches. You’re feeding a hungry gpu at the same rate you’re calculating your lods, therefore you’re not working in parallel with the gpu. Think about it. That’s the whole point of having a co-processor.
I fail to see why you feel you needn’t concern yourself with hardware specific details when the whole point of lod techniques is to work around hardware limitations. Otherwise why not throw the full resolution mesh at the hardware and just complain that the vendors aren’t doing their job properly?
It’s these kind of implementation-specific details which should dictate your approach to the high level design. Something which you seem to keep side-stepping, while maintaining an aloof and arrogant stance. This is why I, at least, am responding to you in a somewhat aggressive manner.
Oh, and for gods sake go to an internet cafe or something…you’re giving the impression you’re communicating via hand signals, strapped to an oil drum in peru or something.

Originally posted by knackered:
It’s not the weight of the lod calculation he’s refering to, it’s the presence of the lod calculation on such small batches. You’re feeding a hungry gpu at the same rate you’re calculating your lods, therefore you’re not working in parallel with the gpu. Think about it. That’s the whole point of having a co-processor.
I fail to see why you feel you needn’t concern yourself with hardware specific details when the whole point of lod techniques is to work around hardware limitations. Otherwise why not throw the full resolution mesh at the hardware and just complain that the vendors aren’t doing their job properly?
It’s these kind of implementation-specific details which should dictate your approach to the high level design. Something which you seem to keep side-stepping, while maintaining an aloof and arrogant stance. This is why I, at least, am responding to you in a somewhat aggressive manner.
Oh, and for gods sake go to an internet cafe or something…you’re giving the impression you’re communicating via hand signals, strapped to an oil drum in peru or something.

first of all, i don’t give much credence to you because you are a rediculously negative sort… as for you complaints about my communication, i don’t understand what you are trying to say… as for your more redeamable concerns, i will try to address them.

first of all, as for parallelism, i doubt this makes much of a difference, but the lod and uploading is done in its own pass. rendering is done in another pass along with frustum culling. cpu operations are all very lightweight, and for what its worth lod is managed in an extremely staggered fashion, meaning only a very small fraction of ‘nodes’ are updated with respect to lod each frame. its not as if the whole lot of them are being updated every frame. and even if they were it is still very light weight. all the cpu is really doing while the gpu is rendering is a few dot products against frustum planes per batch, if that is the frustum is active… so in reality if i really wanted to be focusing on parallelism, i should probably be looking into giving the cpu more to do.

i’m leaving the lod phase open for the gpu so i can integrate pciexpress reading capabilities there. right now only VBO uploads are done in that phase, and only if new nodes are created, which is generally fairly seldom.

i’m not trying to knock the socks off anyone, just looking for hardware advice. i know my hard constraints as far as algorithms are concerned well enough.

let me just make it clear, that i understand you points about pushing a measely 64 triangle max stripped mesh through the gpu.

however there really isn’t that much the cpu has to do while rendering, save for the driver’s use of the cpu, some quick frustum tests which max out for visible nodes (which only occurs if the camera is active – and optimizations could easilly be made to only test nodes in the non-intersecting regions of the new and last frustum)… the mesh is recursively traversed, but this could be avoided with a queue filled on the first pass. other than that, nothing is going on.

plus in my defence, it is only very lately that i’ve switched to the 8x8 model… i’ve been working with unstripped 16x16, which pushed the gpu fairly well. 32x32 was not possible, and the planned 8x8x3 aught to aproach the 16x16 in terms of gpu load.

finally, you are also assuming that the shader model is very simple if anything. a heavy shader model could easilly feel that gap up real quick. i’m assuming you’ve seen the shaders going into games like ‘half-life2’, i think it was called. i don’t think there would be much of a cpu/gpu gap left there assuming that fill is not the limiting factor – ie. triangles not too big in screen space. i intend to try to provide a wide range of options as best i can. i can’t however begin to even consider dismissing the technique, which i feel is one of the most promising i’ve seen in a long time. nothing to do with personal pride.

Ok, I understand. You obviously know what you’re doing.

oiii, is there no way to limit the number of posts per page in these forums?

Originally posted by knackered:
Ok, I understand. You obviously know what you’re doing.
i apreciate the apparent pause in your relentless thrashing, but i can’t help but wonder if i’m supposed to gleam a hint of sarcasm from this???

never the less, taking you at face value, i apreciate your attention and effort to grasp my business.

i wish i could’ve made the situation more clear from page 1, but the mean attention span of bbs patrons is pretty narrow, and don’t forget that you ever insisted that i be less specific.

still, i admit, i really don’t keep up with hardware developments down to the last drop. i’ve watched hardware come and go just like everyone else who has been around for more than a short spell… and for the projects i typicly concern myself with, hardware is ever an after thought which can be caught up with momentarilly at any time as needed.

so what i’m trying to say, i’m still very interested in discussing hardware innovations. especially the mapped VBO api.

i’m also curious if anyone can say… what is the best aproach to work out how many pixels per vertices for a given piece of hardware. that is, for a given number of shader cycles, there aught to be a ratio, which would tell you your optimal projected triangle size, in order to synchronize the vertex and pixel shader units. i figure the pixel unit must work at a much higher turnover than the vertex unit… and i figure gpus probably support more than one of each… or at least will someday.

like i say, i’m definately no hardware guru. i work with abstract systems… that is i’m not trying to turn something out to the market every other quarter. that is why i’m here looking for related hardware advice.

sincerely,

michael

i finally understand the term ‘to waffle’

what is the best aproach to work out how many pixels per vertices for a given piece of hardware.
throw theory and expected results out the window (if u care about actual results) and test by writing a (*)benchmarking test that iterates through the various conditions

(*)though of course benchmark in itself aint a natural situation, so mightnt correspond to an actual app, but at least it will give a better idea than theory

Originally posted by michagl:
i’m still very interested in discussing hardware innovations. especially the mapped VBO api.
There’s nothing to discuss, it does exactly what the spec says it does. It’s really simple, there’s no real caveats anywhere…otherwise I’d tell you, and I’ve spent a fair length of time using it.
If the main reason you’re filling up these pages with your monologues is to get some physical record that you had the idea first, then you’re going the wrong way about accomplishing it. The idea (or rather, the specifics) are still unclear to my mtv-addled mind, certainly not clear enough to defend a patent.
Can’t you write it down, then physically post it to yourself? Or deposit it in a bank?
That’s what I did with my theory about George Bush and the popes ‘illness’.

hostile…

Originally posted by zed:
[b]i finally understand the term ‘to waffle’

[quote]what is the best aproach to work out how many pixels per vertices for a given piece of hardware.
throw theory and expected results out the window (if u care about actual results) and test by writing a (*)benchmarking test that iterates through the various conditions

(*)though of course benchmark in itself aint a natural situation, so mightnt correspond to an actual app, but at least it will give a better idea than theory[/b][/QUOTE]waffling is like when you are talking to hitler, doing your best not to show expression or, be found out that you are really not a stone cold souless sob… then you loose your composure. congradulations, you waffled… your part of the real human race.

as for a vertex/pixel shader ratio. there aught to be a way to calculate what size of a triangle in screen space is optimal for best synchronizing the vertex and pixel units. basicly, assuming your shaders are linear, how many pixels can the pixel unit process in the time it takes the vertex unit to process 3 vertices, or less given stripping. yeah i get the point that a benchmark is a good idea… but there aught be a way to get an average number, assuming some golden ratio of cache hits.

arb_vertex_buffer_object specs [b]

The latter technique is known as "mapping" a buffer.  When an
application maps a buffer, it is given a pointer to the memory.  When
the application finishes reading from or writing to the memory, it is
required to "unmap" the buffer before it is once again permitted to
use that buffer as a GL data source or sink.  Mapping often allows
applications to eliminate an extra data copy otherwise required to
access the buffer, thereby enhancing performance.  In addition,
requiring that applications unmap the buffer to use it as a data
source or sink ensures that certain classes of latent synchronization
bugs cannot occur.

[/b]
i poured over the intire specifications a year or more ago… and i still dont understand 100% what exactly is going on with the mapped api.

i’m assuming you map agp memory directly, so you can put your calculations directly into the buffer, which is only useful if you don’t want a copy of the memory for yourself. is the pointer available after you close the buffer? how would an app handle if you mapped and unmapped the buffer regularly just to use it like system memory? does the data put in the mapped buffer ever go to video memory?

how come there can’t be an interface, to give the driver a pointer to your system memory, and tell it to copy the buffer as it feels like it… maybe with a dma system bypassing the cpu entirely? doesn’t proposed pciexpress architecture allow for an expanded dma system?

that’s all i have to say. i’m loosing my tolerance for this crowd.

would be happy to hear from ‘3B’ again though before i trot off.

I’d hardly say you drew a crowd.
Good luck with your revolutionary new approach to lod. Be sure to credit me in your paper.

Originally posted by knackered:
I’d hardly say you drew a crowd.
Good luck with your revolutionary new approach to lod. Be sure to credit me in your paper.

credit for what? and what papers? nothing personal, i’m just not crazy about this sort of atmosphere… the ‘crowd’ allows its members to be nasty to guests. its all too extremely counter productive. its a shame people can’t communicate functionally, especially with the lack of easy access to technical docs and ever changing hardware. i will never get over how juvenile the graphics programming scene is.

and as for my previous post. please don’t bother responding with an ‘implimentation dependant’ rag.

credit for what? and what papers? nothing personal, i’m just not crazy about this sort of atmosphere… the ‘crowd’ allows its members to be nasty to guests.
Yeah, I have no idea why Knackered hasn’t been banned by the moderators. I guess it’s because he’s been here for a while. The best thing you can do is just ignore him. He likes getting a rise out of people, so it’s best not to indulge him. Just pretend that he didn’t post at all, and you should be fine.