Programmability...the double edged sword !!

and the first step we want is 30x640x480 rays per second… i think this is possible on p10

Um, the P10 is not a CPU. The P10 may be very programmable, but it is still a polygon scan converter, with scan conversion hardware and pixel pipelines and so forth. You can’t make it do ray tracing.

Originally posted by Korval:
Um, the P10 is not a CPU. The P10 may be very programmable, but it is still a polygon scan converter, with scan conversion hardware and pixel pipelines and so forth. You can’t make it do ray tracing.

http://tyrannen.starcraft3d.net/loprecisionraytracingonatiradeon8500.jpg

radeon8500 at raytracing

what can’t i do?

Davepermen,
where are the casted shadows on that pic?
Also, how would raytraces shadows do smooth shadows? Moving the light around and averaging?

okay. read the siggraph paper about it. why its just so crap? mostly because the whole vector math happens with 8bits per component and they are happy to just get a raytraced first hit working. but they describe the algo for all additional modes. and as you possibly know, there is no difference between adding a shadow ray or simply a first hit ray, its just an additional ray (say an additional pass on the ati i guess…). i think you see that it is VERY low presicion

siggraph 2002 documents have it all…

The cache problems with rasterizing textures are handled well because with MIP mapping and screen region local rendering you get a lot of reuse and even sequential access. You are saying that because one class of problems has been solved that another is equally tractible!

Again you don’t draw any distinction between the various problems you want to solve. As for using better than 8 bit ALUs that buys you some raw compute power, but doesn’t resolve the other issues.

This is all slightly amusing because I suspect that you’d hate a real ray tracing system because of the development paradigm it would impose on you. You’ve ignored this and many other points I made.

It seems that for very complex databases a ray tracing approach makes most sense for first hit provided your memory system can store the entire scene and you do indeed get some cache coherency there but the paradigm is an inherently retained mode system of rendering, those two things are a bad combination. Anything resembling state changing happening at the fragment level depending on results. This isn’t of course why you extoll the virtues of ray tracing, all that other hard stuff you mention is a different class of problem altogether with completely different issues. This is why it is important to draw some distinction between these problems.

It would also help if you were prepared to think about how you’d want to program a system like this, because it wouldn’t be about implementing ray plane intersections & bounds tests, most of the efficiency would be from minimizing the tests through some traversal of an abstract data representation, would this be imposed by the hardware or would you write it?

A central feature of a programmable system like this would be the ability to write a shader which requested an incident color (radiosity) at the fragment level from an arbitrary vector traced into the database. So you have a rasterizer requesting (and possible blocking on) ray database intersections. Would this be cached in a really deep framebuffer and post processed with alpha in a subsequent fragment pass or done there and then blocking on the results? One doesn’t seem too programmable to me, or even as recursive as it needs to be, the other seems to have nasty implications for stalling the fragment processing. Discuss.

Remember that at the fragment level you’re going to need your big database traversal with abstract structure to optimize and minimize the number of ray/primitive tests.

[This message has been edited by dorbie (edited 05-11-2002).]

P.S. no Dave, it’s a very simple teapot, there’s no scene or shading complexity and the work to perform a trick like this is just going through the motions. I might use some arithmetic overloading of various shading operations to perform a ray triangle intersect (and edge test) in hardware, but if the rays don’t selectively traverse the database there’s no point. A key issue is how this was implemented, showing every triangle to every fragment doesn’t win you anything, so the devil is very much in the details here. To show a picture and say “see it can be done” shows you either don’t understand this or are deliberately throwing in a non sequitir.

I can see we’re going to have a spate of “hardware ray tracing” on new hardware as it arrives, I can hardly wait, sigh.

[This message has been edited by dorbie (edited 05-11-2002).]

Originally posted by dorbie:
The cache problems with rasterizing textures are handled well because with MIP mapping and screen region local rendering you get a lot of reuse and even sequential access. You are saying that because one class of problems has been solved that another is equally tractible!

dependend texture reads are BY NO MEANS AT ALL a lot of reuse and prefetch. learn why gpu’s can work that fast anyways (because while waiting it can continue the next one probably? because of parallelism of tasks? uh…)


Again you don’t draw any distinction between the various problems you want to solve. As for using better than 8 bit ALUs that buys you some raw compute power, but doesn’t resolve the other issues.

well… having floating points for pixelshading will solve very much problems for rastericers as well. for raytracing it means that the restriction between pixel and vertex shaders drop, so they can be used equally, no rastericer needed really between… that does not solve anything, but is useful to save a lot of tweaking.

what it solves is easy visible. as you have to store all the geometry in 8bit per component, you only have a 256x256x256 scene currently to trace… not that wonderful…

This is all slightly amusing because I suspect that you’d hate a real ray tracing system because of the development paradigm it would impose on you. You’ve ignored this and many other points I made.
hm? its very funny to see that you don’t really know how to get raytracing to work on gpu’s, and thats all i’m talkin about. on normal hardware (cpu’s) its an easy task to get it working, and even if my ones are not very fast (no assembler used, no sse, no 3dnow, no mmx or something, wich result in quite low speed on a p3 500) they work quite good.

on the cpu you’ll get a lot of other problems than on a gpu, because of stalling and memory-accesses etc. this is much less a problem on gpu’s, cause they are designed to manage those problems themselfes. thats not a feature of a rastericer, thats a feature of the hardware…

It seems that for very complex databases a ray tracing approach makes most sense for first hit provided your memory system can store the entire scene and you do indeed get some cache coherency there but the paradigm is an inherently retained mode system of rendering, those two things are a bad combination. Anything resembling state changing happening at the fragment level depending on results. This isn’t of course why you extoll the virtues of ray tracing, all that other hard stuff you mention is a different class of problem altogether with completely different issues. This is why it is important to draw some distinction between these problems.

well… to the state-changing… i can manage to have 256,512 or 1024 (using these number to give you a hint: HINT) textures on gpu’s from gf3 on per pixel, just choosing wich one i need. state-changing?.. well… the objects possibly use different material-parameters, but the material-structure will be the same (at least, for the start), so its sort of fixed function pipeline. i don’t need more, i don’t even need more on programable rastericers, i just use the programability to set a nicer base-material up…

It would also help if you were prepared to think about how you’d want to program a system like this, because it wouldn’t be about implementing ray plane intersections & bounds tests, most of the efficiency would be from minimizing the tests through some traversal of an abstract data representation, would this be imposed by the hardware or would you write it?

thread is about programability on next hardware gen… guess what? i will program this next gen hardware to do the job for me… more or less… how to do? UTFG… use our lovely google and finally read the document wich gives nice inspirations… that is the name: “Ray Tracing on Programmable Graphics Hardware” and the filename (should be enough for google) : “rtongfx.pdf”

A central feature of a programmable system like this would be the ability to write a shader which requested an incident color (radiosity) at the fragment level from an arbitrary vector traced into the database. So you have a rasterizer requesting (and possible blocking on) ray database intersections. Would this be cached in a really deep framebuffer and post processed with alpha in a subsequent fragment pass or done there and then blocking on the results? One doesn’t seem too programmable to me, or even as recursive as it needs to be, the other seems to have nasty implications for stalling the fragment processing. Discuss.

hm… little iritated about the text (in this editbox here in i can’t read it that well…)
i’ll comment that later…
but i know one thing… i talk about a raytracer-engine, not a raytracer-api. i don’t need programability at the end, i need a raytracer…

on the next gen hardware it will be quite difficult to arange all to get it working, thats right… it will be the first time where it is possible and thats cool enough to push it. if hardware-designers realise that pushing raytracers could mean money (at least, for workstations for example…) they will design hardware more "raytracer"friendly… means letting the rastericerpart be enable/disable-able…(what a word…)
we’ll see…
i don’t see much problems in most of the problems you suggest, i know they ARE there, but they are not major problems… they are just used everytime to bitch on raytracing that it will never be possible. but it is, quite well even, and this quite well running stuff is only on a p4 2gig, not on hardware designed for raytracing…

Dave the issue is not whether it can be done but whether it’s worth doing. You can insult by implying I don’t know how to do this but in reality some of the implementation decisions are arbitrary, and if you don’t care about relative merits then what’s the basis for evaluation? Ray tracing == good?

A lot of this is in line with my expectations, especially the multipass for reflections & storing vectors to the framebuffer, and the need for a higher level traversal as the key component. I just don’t see how you can take this paper as a basis to refute all of what I’ve said.

The most interesting part for me was the ray traversal of a voxel texture to do a dependent fetch of a geometry texture, I hadn’t anticipated that, largely because the problems with it seem obvious. Maybe they can keep it on chip and make it go merely slow for modest models.

It’s not clear how this relates to the ATI paper, I doubt the interesting part is in any way related.

This is not a discussion we’ll agree on I think since you’re advocating ray tracing rather than evaluating it, and have cited features that are beyond the merely impractical to bolster the case.

[This message has been edited by dorbie (edited 05-11-2002).]

yep. why this whole issue?
because raytracing is good.
i want simple correct accurate representation of the realworld. the only way that can do it as near as we want with enough horsepower IS raytracing, the ONLY way. rastericing CANT do any of the more complex stuff without hacks, raytracing can. (and if you take a look at todays complex stuff, they all work with rays till it gets merget in some textures to do the whole on rastericers then, but all the job before has to be done with raytracing…)

we can’t stay for ever with the funky rastericers, they don’t solve anything more than plotting triangles on screen. what they can do with this is awesome, and i love shrek, but this does not mean we can do everything with it generally. we have to do special hacks. thats why we need a different shader for every stupid feature. we can’t just have all the features all the time availible.

raytracers are an easy and general solution, you don’t even need linear math to use them (i mean matrices), you can stick with quaternions, vectors, and calculus for the more complex stuff.

today, no, raytracer can’t do what a rastericer can in the same speed, but just saying hey, rastericers are fully boosted now, raytracers not, does that mean we have to drop raytracers for ever? can’t we push them anyways to get them faster and faster till they CAN overtake rastericers? and nextgen hardware is the first giving the possibility to power raytracing up with help of spu’s, wich is quite handy

just open your eyes, those fakes don’t help you out that long… funny small waterriples on reflections are nice, but if you want huge stormy waves as in “perfect storm”, you have huge problems…

if you want do do volumetric fog (and a lot of people i’ve seen liked to and failed), you have to hack around with your scene-graph quite much…

global effects need a global knowledge of the scene… reflections need them (real ones as well as fakes), fog needs it, transparent objects need it (there we z-sort) and and and…

raytracers mean you don’t have the problem of switching between planar reflections to cubemap-rendering for different reflection-objects
raytracers mean you don’t have to z-sort anything at any point in the rendering
raytracers mean you can have everything realtime

IF we get the hardware-vendors to do some support. but as long as we all sit on geforces and love nvidia we can’t make any step further (we didn’t, even with the funky shaders they gave us… wich aparently are NOT really programable, texshaders i mean… and rc’s i have enough of them on my gf2 to do whatever i want)

vertexprograms are great, but at the end all i use to do is pushing trough some interpolated point_to_point vectors, to do the whole shading perpixel, as vertexshading is a hack we have to drop as well (why should we push thousands of triangles if one represent the flat wall?)

and rastericers are possibly more easy simply for guys who don’t like to sort their data, but i prefer having my stuff structured and clean, and then, raytracing is VERY EASY compared to rastericing…

i feel this is a lost cause, but dave (i assume its your first name) at one stage everybody in their graphics programming lives believes that raytracing is the holy grail, but after u discover a few of there limitations u normally change ya mind, eg raytracing doesnt tend to do GI very well (unless u pump a sh*t load of rays into the scene which is not gonna be practical even if cards can do 1000x the tests)

to contradict myself i mentioned this before but i think opengl2.0 aint gonna have display lists (personally i dont use them in opengl1.X) but i can see a use for them in the future if the card (in virtual memory whatever) could store the whole scene, think of the benifits, GI, reflections etc. then again theres the problem of hows the card gonna store that 1000km2 scene with hugely detailed trees etc. all the stuff im doing is moving towards this,its like imploding, the grand unified theory of CG. definitely i feel we’re in a golden age of computer graphics

Ray tracing doesn’t mean you can have everything real-time. Yes it’s simple, it’s a one solution to solve many problems, the Swiss army knive with one blade, but for many problems it is brute force and slow. This is why people generally don’t differentiate between problems when they advocate ray tracing. I think it’s important to do that.

As for sitting on GF and loving NVIDIA, what has that to do with anything? They are going FP if this paper is anything to go by but they aren’t doing it just so you can write a ray tracer in their fragment processor.

zed, if you look at the paper Dave cited they assume in future you will be able to store triangle geometry in a 3 component floating point texture and index to it using a dependent texture read (after a voxel ray traverse to a triangle list, both in texture), so your texture would effectively become your on card geometry cache. Display lists would be redundant. That’s the essence of their ray database intersect. I assume you’d invoke the ray trace of the scene by drawing a single quad over the frustum with an interpolated 3 component ray vector which would be used by the fragment geometry engine.

A key issue it seems is the model size, you’re fetching texture during a ray traversal of a regular grid. You must partition your model, you have a lot of texture fetches striding through memory for each fragment, but you have to trade this number against the triangle intersect tests in the voxel. That’s an unhappy tradeoff, but there may be other structures that would work better for bigger more detailed models. They claim they’re compute not bandwidth limited, but how does strided fetch latency affect real performance? Very badly I’d guess unless you can keep the voxels on chip, which means keep it small which means more triangle tests, maybe if you had a 1 bit texture and only did the fetch of the triangle pointer from the bigger off chip voxel list it would help, but they store a full pointer at every voxel in the paper, maybe if you had 3 voxel textures depending on ray orientation to improve fetch coherency…? Maybe a vector based prefetch…?

Anyway it’s fascinating, and I’m not trying to pick holes in a great piece of work.

[This message has been edited by dorbie (edited 05-11-2002).]

well… i see a lot of people around who thought that a gf3 was the holy grail, then they now think gf4 is the holy grail and so they continue buying every piece of them not looking left or right. currently the best solution for programable hardware is shown by ati, and next are following soon. but everyone sticks around on texshaders and registercombiners, wich are both very crappy to program on them (don’t ask me, i know rc’s, i’ve studied them over 2 years now and i know how powerful they are, but anyways they are crappy to use)

i don’t see any point in your stuff. yeah rastericers are pushed to the limits and very fast like that, but hey, they should not let you close the eyes for the rest.

i think most of you are blinded by the truth that rastericers are not very good for the stuff they actually do today… thats why we start pushing extensions like rendertexture to render several views of the whole scene for every place we need a reflection,refraction or soon for lighting as well. this leads to renderings of our scene up to several 10’s or 100’s of times (depending on what ya wanna do)

at this point raytracer catch up, because they need much less switching of memory-targets and all… in fact, a raytracer can be coded that the raytracer itself needs 10floats for itself, and the screenbuffer to draw in…

always remember that hitting a scene with N triangles means O(logN) in a good raytraced scene… in a trianglesoup it means O(n), for sure… but hey, really… todays hardware is fast enough for pushing through a big trianglesoup for you and you don’t have to care… AS LONG as you don’t want to use the advanced features of the hardware… if you want envbumped surfaces everywhere around and all, you have to start organicing your scene again, culling like stupid and lodding as well…

well, with a framebuffer possible to store a whole ray you can do even a polygonpusher hardware like before, simply draw your triangle with another way, but you have to draw every triangle today, so why not intersecting it with a bunch of rays instead of plotting a bunch of pixels? no big difference… and for the additional reflections, refractions you have to push them trough again one time… this leads to one pass first visibity check, one pass shadowing, one pass first reflection resulting in three passes, tree times all your triangles drawn… that is not that much work really… does not need good scene-graphs, does not need intelligence, all it needs is a different pixelshading setup…
well yes, its bull****, but it runs quite fast… just need hardware with huge fillratio, so a gf4 would be even enough…

wazzup? where is your motivation to start helping on the _R_evolution instead of following the _evolution of the others?

john carmack pushes hardware to build in features for him, all these new features are due some inspiration from him (and some others)… that means if we get some raytracing-architecture working with some crappy demo, then hardwarevendors can get inspired as well and build the new exts in for faster doing the same… they did it with shadowmaps, they did it with depth_clamp, they did it with textureshaders, why not some raytracing-helpers? (if you go on the nvidia page and look at a lot of old demos, they do stuff like 16bit shadowmapping, enviromemntmapped bumpmapping and much more, damn slow, but they do it… before the hardware came… this is a big HINT for everyone needing a feature on next gpu’s)

Dave, in reality there have been people in the industry advocating floating point framebuffers when Carmack was still writing his Doom engine and advocating perspective correction in software engines (I’m not saying I was one of them I wasn’t).

Carmack has been the most visible and recent of them. A few years ago it wasn’t clear that game cards could ever justify extended range and precision framebuffers. Anyone who wrote even a simple emboss bumpmap algorithm instantly saw a need for more precision, and signed arithmetic that doesn’t clamp at 0 or 1, (I mean implemented as opposed to lifted verbatim from someone’s example code). Even with this revelation you’d typically have been late to the party.

I think Carmack and his advocacy deserves a lot of credit for giving hardware developers some faith that if they built it developers would exploit it, but it didn’t hurt that the hardware developers who left SGI had just spec’d out or worked on a piece of hardware that had floating point framebuffer and fragment arithmetic support. It didn’t hurt that Carmack saw a presentation from Mark Peercy of SGI that showed that with FP framebuffers and ‘dependent texture’ you could implement renderman with multipass on hardware, although Carmack had been advocating ERP before then, I think he became bullish on real FP right around that time.

[This message has been edited by dorbie (edited 05-11-2002).]

i’m with you…

when you look at the nvidia papers explaining the so new features, you see references to documents from 1960 and such… well… our “problems” are old, and the solutions, if existing are found long ago… thats not the point, thats just plain fact… (there are even new topics coming, but they are so complicated, my brain crashes on them just reading the title )

all i want to say is, its quite easy to change the future of hardware, if you show up something cool. show it slow, show it with software, but show it. i try to get now a simple gf3 or something to show up a simple raytracer with it… all i need is much passes, so i hope to get some gf4ti4600 to bench it with that hardware…
but it will be a funny hack, nothing more… as long as there is no floatingpoint calc everywhere, you can’t represent a scene in a good way (mostly because of the 0-1 range, HATE IT HATE IT HATE IT )

think i buy now a new pc… full with nice fast amd cpu and geforce4 thought i hate both … they are mainstream and i want to show what mainstream yet can do, so i can’t code for an ati radeon8500 currently, its not a mainstream hardware… hope the radeon10000 will be bether, go for ati!. then i’ll buy such a card and work on this… and and and… (oh, and i think amd are damn cool x86er cpu’s, but… well… x86 is **** imho. it works but its (mostly the fpu) design is stupid somehow… good that i have a compiler that does it for me, thats why i dont move to sse2p4 )

and, well… one time… we’ll see global illumination realtime… and no one will say he does not want to… it will make games looking more real than episodeII, and it will actually help solving a whole lot of problems, say for the home-enthusiasm, who want to place his dolbi digital surround sound the best way in his house, can calc how the sound flows. architects can do all the design in realtime and all the time see how it looks, the resting power can be used to calc if the building will be stable…
movies can be realtime manipulated, news-reports on tv don’t tell the truth anymore, big brother is watching you… the future is bright, go for it!

One last niggle Dave, you said:
“well no, the lines should not be straight if i capture my scene with a camera with lenses. and like that. no. if i take a cam, go out, film a straight line, then i see the curve on my tv at home.”

You have a monitor or other display. It’s flat and has a rectangular display region. It forms a projection with your eye looking at the monitor. If any straight line drawn on the monitor is rendered as curved then it is geometrically wrong. It has indisputably been rendered with the wrong projection. Lenses of all sorts, even wide angle lenses are all designed to produce straight lines and avoid what they call barrel or pincushion distortion, only some radical fisheye lenses have anything else as a design criteria. Zoom lenses tend to have very slight barelling when wide and pincushion when at full zoom because of design limits, this is considered an undesirable artifact. If you insist on reproducing this it can easily be done with render to texture and conventional graphics.

[This message has been edited by dorbie (edited 05-11-2002).]

Most of the CGI movies nowdays are not raytraced!!! Most of them (All Pixar ones, Final Fantasy…) are rendered with the PRMAN renderer which does not support raytracing yet. I think the PDI renderer(Antz,Shrek) does not support raytracing too. Raytracing anything that does not involve uber-realistic reflections and rafractions is MUCH slower and gives you the same visual quality.

Originally posted by dorbie:
zed, if you look at the paper Dave cited they assume in future you will be able to store triangle geometry in a 3 component floating point texture and index to it using a dependent texture read (after a voxel ray traverse to a triangle list, both in texture), so your texture would effectively become your on card geometry cache. Display lists would be redundant.

ok i understand, nice method which ild honestly never thought of before
though by display lists i did mean something completely different to the current method

thats the deal. vertex arrays will be the same as textures, and this yet quite soon (both store floatingpoint-values, its just a mather of the api to give a generic buffer to use instead of textures and vertexarrays and indexarrays and all that stuff…)

hope that comes soon, gives the possibility to render into a vertex-array, means updating physics on gpu for example.

Dave, the paper describes two implementations, the useful one assumes branching and looping in the fragment instructions. The other is multipass & branch with stencil for all processing, that would be REALLY slow, not to mention brute force with all fragments doing the full work even with an early hit.

I don’t think you’ll get the interesting programmability in the fragments in next generation hardware. It’s clear these guys have inside knowledge of future hardware and have let the cat out of the bag in their paper with the distinctions they draw.

The key question is; is their branching looping fragment processor really on the drawing board or is it just on their wish list? I’m sure they’re lobbying NVIDIA who seem to have been very helpful so far, or did NVIDIA have it planned anyway?

[This message has been edited by dorbie (edited 05-12-2002).]