Virtual multitexture?

I was just thinking…
Most modern 3d games (and most likely all future games) draw the same polygon multiple times with different textures and different blend modes…
Yet, most cards only support 2 (or 3) multitexture units.

The problem is that blending between those texture units is so different from blending the results of all those texture units with the frame buffer…

It would be nice if the whole way multitexturing worked was that it would be trivial to make a system where you would get the same result if you used multiple single layers or multitexturing.

Right now it’s complex because blending between texture units is fundementally different (and more restricted) than blending textures with the frame buffer…

Anyway, a thing that struck me is the waste of re-drawing the same polygon when redrawing several layers…
The thing is, the T&L chip (or part of the chip or whatever) recalculates the polygon everytime you draw it (sure it has a cache, but when you draw a lot of polygons, and you sort per texture, it’s probably not in the cache anymore)

But what if you had an endless ammount of texture units?
You would be able to reuse that geometry data over and over again, only being limited by the texture memory (which you should be carefull with anyway) which is required for all the textures of all the layers…

I can see you thinking “endless amount of texture units?? dream on!”
Give me a sec to explain…

As far as i know the biggest bottleneck in the chip itself is the geometry transformation because it’s hard to do in parallel…

The software implementation of texture units works something like this.
You calculate what a certain pixel looks like and you feed the result to the next unit and it blends it with the pixel it calculates and so on…

Now suppose you could feed back the end result of your last texture unit back into your first texture unit?

Sure, it might be slightly harder to schedule, it would probably require some specific hardware, and your texture caching might not be that efficient anymore…

But you wouldn’t have to recalculate the same polygon again and again, and you would have to send fewer commands and geometry to your video card…

And ofcourse it would make the life of programmers like myself a lot easier

Yeah, I read a recent paper that suggests the same sort of thing. The problem is wouldn’t that be more inefficient? I mean changing textures at each layer loop could imply some sort of cache miss, wouldn’t it? Whereas if you have static texture units, each unit can have a local cache (I’m just supposing so). On the other hand, it would simplify the chip while making it more powerful. By the way, the same technique could be applied to register combiners.

This wouldn’t work that well, as proposed… there are inherent pipeline limitations that prevent you from feeding the results of one pass back into the pipe for another pass.

It is a real problem, but there is no solution (yet).

  • Matt

The KYRO chip does something in this way. It has two texturing units but can accept up to eight textures in a single pass. It loops back the results and doesn’t write anything to the framebuffer before every texture is applied. This make the image quality in 16bit mode much better than on other cards, actually, it’s very hard to see the difference between 16bit and 32bit.
It’s tilebased, which means you only work on one tile (in this case 32x32 I think) at the time. I guess it would be kinda hard to implement it with traditional rendering.

[This message has been edited by Humus (edited 11-05-2000).]

Thats why traditional rendering is rubbish! :0). But seriously, powerVR type tech is definitely the way to go, check out Naomi2 for example.

(I know I never miss a chance to evangelise powerVR but I just love it, especially when playing F355 on my Dreamcast!).

For example…
The Savage2000 Chip has the posible processing four textures on a single pass on a pixel but the chip, review the data sheets S3 Savage2000 DataSheet its four textures for a single pixel on a half clock unit… its possible (theorical) processing eight textures at 32 bits with S3TC, Trilinear Filtering + Biliniear Filtering + Feedback Texture in a single pass on two pixels…

Yeah, internal rendering depth can be as high as you want, as long as you don’t have to write it before it’s finished. You could actually do, say, 64 bit rendering with 16 bits for all the colour and alpha values. Of course, such thing isn’t wise, as it increases transistor counts. Carmack, though, wants it. Hehheh.
NV20 is supposed to be able to do one-write rendering just like Kyro, because it’s a deferred renderer as well. If Kyro does 220mp/220mt and it whips MX of 350mp/750mt, then nv20 of 1200mp/3600mt will whip any normal chip of 1900mp/5700mt. Yummy! Now, what were those Rampage specs again?