Originally posted by GPSnoopy:
The remapping of the indices is for reorganizing the vertex arrays (and not the indices themselves). The goal is to improve the memory accesses by improving the cache coherency (but do not mistake this for the post-T&L cache, which is something completely different).
For example, a vertex array that is accessed sequentially is faster than one that is indexed randomly.
1, 2, 3, 2, 3, 4, 3, 4, 5, etc. will be faster than 1, 456456, 2547, 84125, etc.
i definately would never have guessed that. it seems odd to me that the hardware would be organized that way. can you or anyone make an architecture argument for this?
The triangle stripifier with the red lettered logo is Stripe, not Tri Stripper.
hmmm, maybe i overlooked yours then. i promise to give it a look as soon as i pass this post on. i think i looked at two, one ‘atc’ or something, and probably stripe.
The main advantage that NvTriStrip has is that it outputs better results, but the difference compared to Tri Stripper is minimal. Although I had people telling me Tri Stripper was giving better results for them, but I could not benchmark this myself.
i’m actually working on an interesting project right now that might make for a good benchmark. i’m stripping about 200,000 small very similar meshes. you can find a recent visual at:
http://arcadia.angeltowns.com//share//genesis-mosaics-lores.jpg
assuming that url is correct. to me, the resulting strips don’t look to be terribly cache friendly, especially given the simplicity of the meshes. looks like there is a lot of room for improvement to be had. those strips were done with nvidia’s utility library btw.
NvTriStrip is not slow cause of the lack of connectivity information (Tri Stripper also has to compute it), but IIRC it has some algorithms that have a complexity of O(n^2).
yeah, i imagined so much… just wishful thinking on my part that connectivity might be an issue. just out of curiosity, what percentage is connectivity versus cache constraints/strip fitting?
i regularly strip fairly large meshes for general projects, but this project in the screen above is basicly composed of a whole lot of relatively small meshes… so in that case the exponential performance is not a major concern. i have had to walk away from my machine in the past for about 10minutes while nvidia’s stripper spits out a fair sized mesh… but stripping these 200k little meshes generally runs about a day. and i suspect in the future the size of the meshes will probably go up significantly for different optional resolution databases. i’m a little bit frightened about how that might scale… 2 or 3 days, maybe more… so basicly i am looking for faster alternatives.
especially if and when i provide a conversion tool for end users to use on their own machine. for the users, there option would be either to download the database for their hardware and needs, or generate it locally. the download for a database that takes about a day to generate is around 20MB. but downloading also adds traffic to the download server. and for larger databases and users with lower end modems, it might be in thse users interest to generate the data locally. so ideally, to reach a happy medium, i would just like to be able to get the database generation tool as streamlined as possible, and just host common, smaller databases… because the range of options could be too high to satisfy everyone’s hardware and application needs.
I suggest you look at Tri Stripper webpage for more info on the algorithms that are used; again it’s a bit outdated and my English is not so bright but it’ll give you a good overview.
i will give it a look. just out of curiosity… are you fully comfortable with the nvidia algorithm? or are the poorer quality strips just a natural result of the obvious performance boost? in the end i could just compile yours and nvidias code (and maybe others) into the utility, and it would just be up to the user to choose between time and performance when choosing options for compiling their database.
if you are interested in what i’m doing with all of this stripping… there is another very imformative thread in this forum (advanced) titled something like, “unique ROAM VBO issues and a clincher”, if my memory serves me.
sincerely,
michael
PS: in defense of nvidia’s stripper. my meshes are really not very strip friendly on the face… much more fan like in organization. so that could explain to a fair degree why the strips might not look too cache friendly. what would really be nice i figure would be a stripper that can make the decision, against some user defined visual bias, to flip the inside edges of quads here and there to promote a better stripping. how much trouble do you think it would be to integrate that kind of feature into your system?
PPS: counter to nvidia’s defense, they could’ve stuck a degenerate triangle here and there to get a much better cache layout it seems. there appears to be some degenerates in the highlighted mesh in the screen referenced above, even with primitive_restrat enabled… meaning nvidia’s stripper does at least have some capacity to negotiate between the appropriateness of a degenerate versus a restart.