GPU idea - Would this work?

Sik_the_hedgehog · July 2, 2008, 9:37pm

Just an idea that came to my mind. First a bit of story, once I was making a design for a console. No, I don’t work for that, it was just an idea that probably will never be By the way, the thing is, I was too lame that instead of a proper rasterizer I was going to use a normal processor as the GPU. DSP, very long instructions, dedicated video instructions, whatever needed, but a normal processor after all. I swear, it was pure lazyness, I didn’t know that was going to be the trend that engineers would be seeking in the future, I didn’t even know about the existance of shaders.

Anyways, there’s a problem with that: fixed functionality is always faster than one done through programming, simply because of the fact you can hardwire it as part of the circuit and don’t waste any extra cycles. So I was thinking: how would an hybrid GPU work?

Basically, the idea is simple: there’s a processor, and there is some extra hardware to help with rasterizing and such, maybe as part of the dedicated instructions. So, while the processor could do rendering by itself, it could as well use those fixed hardware, even if just to do part of the rendering. As if the hardware were ultrafast versions of some common code.

Example: you want to render a normal triangle, but use custom shading/blending/whatever. The processor would do all the calculations for every pixel, however it would use a hardwired circuit that would guide it through all the pixels by just using a single instruction, rather than having to calculate which pixel goes next too. This would speed up things a lot.

Mmmh, well, I think it doesn’t look that simple to explain. Anyways, I wanted to know your opinion. The good thing of using a processor is that you can pass any kind of data to it, so (for example) you could let it to handle models and their animation directly rather than doing it in the CPU and then sending the polygons manually. Having hardwired helpers makes programming of drivers easier too, because some common functionality is already there, but you aren’t forced to use it if you don’t want.

What do you think?

Zengar · July 2, 2008, 11:23pm

You miss an inportant point: fixed functionality is more expensive. It is easier and cheaper (in transistor count) to build a bunch of simple programmable units then to duplicate everything in fixed function. Also, it is not really clear if fixed-function hardware will really be faster. What functions exactly do you want to have dedicated hardware?

zeoverlord · July 2, 2008, 11:45pm

And more to the point it doesn’t have to be, as the stage that will and have to take the longest time is the fragment shading stage and that stage works pretty much like a small and slightly limited cpu.
And that’s really what you need, lot’s and lot’s of processors, the latest GPUs use over 200 of them.

Sik_the_hedgehog · July 3, 2008, 4:23am

Um, ouch.

Still, some stuff really doesn’t make sense to have to be programmed. For example, why to program the code for rastering the pixels of a triangle, being it a so common operation? And very unlikely to change (even for curved surfaces, the change is minimal). Even for raytracing like algorithms it’s useful (the scan lines algorithm is like raytracing). It would make sense to have a dedicated instruction to do that.

Um, forget it. Probably I said something stupid.

Ps: in the eighties engineers though micro coding was the way to go due to simpler circuits. Now they’re desesperate to make their processors as RISC as possible. I guess the same applies here?

Also, about raytracing, the depth buffer should have died when blending appeared. It simply causes too many problems with that. Look at the approach the Dreamcast used for a good example of a replacement.

zeoverlord · July 3, 2008, 7:07am

Yea, if only it where even somewhat advanced, but it’s not, remember most games used to do this in software up until quake3 and those games ran on crappy processors, you could get a decent fillrate even on a 486 in some games if you did a bit of culling and not that advanced shading (read: no shading what so ever, just colors and the occasional texture), i mean i used to play the original quake on my 486DX (though the framerates where not great).

About the depth buffer, the method the dreamcast works with is better the fewer polygons there are, basically it culls all polygons except those who exist within a specific tile, then it sorts them and renders them, this works well when you can count your polygons in the thousands, not millions as we have now.
also, it basically stalls the render pipeline as you have to produce all polygons before rendering can begin, though blending would work better.
So it works in theory, but it is a very special case of rendering, and we are moving away from special use and over to general use.
The z buffer is useful in many ways outside of depth testing, and if you use a deferred rendering method the overdraw problem is solved, also deferred rendering works really well with raytracing.

If there is any special use processing circuitry i can tolerate today then it would be a ray lookup processor, one that just tests a large amount of rays against a large amount of polygons.

Sik_the_hedgehog · July 3, 2008, 8:35am

Yea, if only it where even somewhat advanced, but it’s not, remember most games used to do this in software up until quake3 and those games ran on crappy processors, you could get a decent fillrate even on a 486 in some games if you did a bit of culling and not that advanced shading (read: no shading what so ever, just colors and the occasional texture), i mean i used to play the original quake on my 486DX (though the framerates where not great).[/QUOTE]

Do you really want to write such a non-interesting too common code? (assuming you aren’t doing any strange effects with it)
I’ve seen a Commodore 64 demo that can do 3D rendering at a decent speed (about 15FPS). Filled polygons, yep. And that computer has a 1MHz processor and the demo was meant for real hardware only. I guess a 486 can do a lot better
Rendering on a separate processor will always speed up anyways (provided you don’t bus request it all the time).

zeoverlord:

About the depth buffer, the method the dreamcast works with is better the fewer polygons there are, basically it culls all polygons except those who exist within a specific tile, then it sorts them and renders them, this works well when you can count your polygons in the thousands, not millions as we have now.
also, it basically stalls the render pipeline as you have to produce all polygons before rendering can begin, though blending would work better.
So it works in theory, but it is a very special case of rendering, and we are moving away from special use and over to general use.
The z buffer is useful in many ways outside of depth testing, and if you use a deferred rendering method the overdraw problem is solved, also deferred rendering works really well with raytracing.

That method probably could do a lot better these days. Remember the PowerVR2 already existed back in 1997 (where the Dreamcast design started being developed), and its polygon rate could only be reached years later using Z-buffering.
Ignore the fact it uses tile rendering. Still, you can avoid a lot of processing for data that will never make it into the final frame. This is very useful when you take shaders into account, as they can get very slow as the card can’t know what a shader will do and have hardware designed around it.
If you use polygon normals, you can get native mirroring support And you know that thing would help a lot.
Like the one I’ve just mentioned, a lot of effects are straightfoward using raycasting-like algorithms, while without them we have to do some ugly hacks, sometimes being a factor for slow down (specially techniques that require rerendering of the entire scene).

Keep talking, seems interesting :eek:

Oh, by the way, little detail I forgotten to mention, it doesn’t have to be a computer GPU. It could be as well a simple GPU with VRAM hardwired on the motherboard and the CPU could access the VRAM directly too using bus requesting, without any stupid port delays and such. I don’t think many of you are happy with the fact that PCI or AGP speed can limit your data transfer rate for no reason

zeoverlord · July 3, 2008, 10:33am

My guess to why TBDR (the method PowerVR2 uses) is not used is that it’s to limiting, sure it saves on a lot of processing but so does deferred Rendering,

It would be, wouldn’t it.
Infact i think i might just start working on a hybrid renderer using CUDA or something.
It is my understanding that at least Nvidia are working on it as they recently acquired rayscale, a company that specializes in hybrid raster/raytrace renderers.

Sik_the_hedgehog · July 3, 2008, 10:59am

Remember that the PowerVR2 is old and hence there weren’t many techniques that could be done back then. I’m very sure that it wouldn’t have taken long to implement new stuff. Besides it was speed and memory limited, and I guess that was the main reason for using the tile system rather than processing the entire buffer directly. I would want to see what would have happened if that was done these days.

And I’ve seen too many raytracing experiments lastly all over the place (not just some random suggestions in this forum). It seems raytracing, sooner or later, will be the way to go.

zeoverlord · July 3, 2008, 1:49pm

It’s only a matter of when gpus get powerful enough to make raytracing a viable real time rendering method, i will give it two years until we start seeing some practical stuff.

Sik_the_hedgehog · July 3, 2008, 9:41pm

Yeah, OK. Meanwhile, the scanlines algorithm seems to be the best. It’s somehow a fake raytracing because even polygons are already projected and such rather than going through a real 3D space, but it’s fast enough. And if it’s done on the GPU, this means that you’re removing a lot of workload from the CPU, and doing proper depth checking would be feasible, hence making it pretty much indistinguishable from the depth buffer method.

Mmmmh, I wonder why are there so many algorithms called “scanlines” x_X One for rendering individual polygons, one for rendering entire worlds, one for faking the rastering defect from old monitors, etc DX

Ilian_Dinev · July 4, 2008, 8:39am

The fixed-function circuits that matter still exist in GPUs: attribute fetch, triangle setup, early-Z cull, texture fetch/filtering, ROPs. Shaders are vital, imho. The only bugger imho is that you have to use shaders even if your vtx-transformation is multiplication of a vector with a matrix (for setting-up early-Z with the non-skinned scene geometry).
Bonuses from there on would be to add more instructions, that merge 2 or more simpler existing instructions: like a multiplication of a matrix with a vector, dedicating circuits to currently-emulated often-used opcodes, dedicating circuits to custom convolution filters. (but none of these bonuses might be really necessary).

ZbuffeR · July 4, 2008, 3:11pm

To me the only good side of raytracing, is the ability to handle quite nicely some simple cases of transparent materials, with reflection/refraction/absorption, especially when curved and with nearby objects to be reflected. For everything else, deferred shading is way better, and it fits well massively vectorized computations.
Try doing smooth shadows or smooth reflections with raytracing !

And both deferred shading and raytracing can not handle antialiasing very well though.

Speaking for “future rendering”, I would love to see hardware acelerated unbiased rendering. Waiting dozens of hours for a single image is tough, but results are promising :
http://www.luxrender.net/gallery/main.php?g2_itemId=327

zeoverlord · July 6, 2008, 8:11am

Well not yet anyway.

Well technically speaking, if you have ray tracing hardware that is enough generalized you could run an unbiased renderer on it (as most of them are really ray tracing in inverse), it just won’t be in real time, but as it becomes faster at one point the unbiased renderer will become a viable option.