I am trying to do something like this in a fragment (pixel) shader:
Crytek and Epic were both there showing off Farcry and Unreal 3 on the NV40, some cool demo's with Crytek going into detail on what they did to use SM3.0. Interesting they were very honest (given they were at a NVIDIA sponsored event) with the problems encountered, subtle problems like pixel shader not having a constant index register (which stopped them implementing there intended PS 3.0 lighting model) and stencil/single-pass lighting problems.
Personally, I think that fragment(pixel) shader constant indexing lack is a PAIN in the Shader Model 3 and I wanna know ( if possible ) how you implement/can implement it using OpenGL GLSL and if you are solving this in the upcoming Unified Shaders like Xbox 360 does… Also really can’t understand why you missed this IMPORTANT feature …
For example, without this fragment shader constant indexing, the deferred shading must be done using multiple passes ( one for each light ) using the stencil buffer ( which is very slow )… If I could have this feature I will do only one pass, save the stencil test ( and fill rate ) and skip tons and tons of really absurd texture2D lookups reading the same pixel that is affected by multiple lights…
As conclusion ( if the response is negative ), I think all the marketing involving the Shader Model 3 is just only that… marketing… Without constant indexing in the pixel shaders we are completely LIMITED like we were in ps1.1 and ps.2.0. I think will wait to see the new unifed shaders before to start making some serious global-illumination-deferred shading-dynamic lighting…
Well, I don’t think SM3.0 stands and falls with constant indexing, especially since you can achieve the exact same thing with texture lookup tables and get a way larger constant register set this way too. I would say it’s one of the least important features, though it may be a convenience.
Well, I don’t think SM3.0 stands and falls with constant indexing, especially since you can achieve the exact same thing with texture lookup tables and get a way larger constant register set this way too
I just completed the texture lookup thing. Noticed a great performance reduction ( 635 to 380fps ). I think is due to:
Need to dynamically write the texture. This process is done only once ( writing all lights in one pass ) but requires AGP/PCI bus transfer and CPU/GPU syncronization. Of course, this will be the same for vertex/fragment shader constants, but I SUSPECT the drivers optimize more the constant submits than the dynamic textures.
Processing each pixel in the final deferred shading quad requires, for each light, a texture lookup ( 2cycles? ), so linearly-augments the processing time for each light you add. With constant indexing this cost would be reduced to zero. And what is more preocupant… you will need to make the SAME texture lookup different times for each light ( because I can’t cache it in different pixel calls ), so deferred shading theory suffers and is like returning to old local lighting models…
The texture1D lookup adds more instructions to the shader and reduces available registers.
The texture itself occupies VRAM ( not much but why to use VRAM if there exists GPU fast registers … )
The shader model 3 can’t NEITHER index texture samplers? I mean can’t do:
varying vec4 inUV;
vec4 col = (0,0,0,0);
for ( int i=0; i<16; i++ )
col += texture2D(projectorTex[i],inUV);
and this can’t be emulated in any way…
Of course, the constant indexing feature can be emulated using the texture lookup thing, but I think it would be better to say “lets wait 2 more months and implement pixel shader constant indexing like we do in the vertex shaders” than “hurry hurry, lets augment pixel shaders instructions to 65535 because we can’t do constant indexing and loop unroll is going to be a PAIN”… now you force me to buy a future unified shader graphic card to implement this in a better way ( well after all this is like Tucker’s Car film… If you completed a PERFECT ps3.0 who will buy your nxt toy? )
If I use shader model 3 is to gain performance and to avoid limitations, not to use texture lookups tricks like we did in ps1.1 to normalize, specular pow, cos/sin… I almost hope this post could contribute to show there exist a serious problem /limitation in the current ps3.0 model and to solve it in the next shader model.
V-man, thx for the link… Yep yep was a nice idea to use NVEmulate to see what NVIDIA does… but problem is that when I run it in my Athlon64, GF6800 with Forceware 81.85 and WinXP Pro 64bits, a nice Blue Screen of Death appears mwahahahaha! So I have no idea how NVIDIA treats the mentioned shader 8( /cry . In my other computer with a X1600 obviously can’t run NVEmulate…
Anybody know a tool like NVEmulate working in Win64 with an ATI pls? I REALLY wanna test that constant/texture sampler index thing!