Render to Texture vs. CopyTexSubImage

heck, what’s wrong with my test results then???(using fx5200, forceware52.16, amd1.8mhz) for RTT, i got 6fps while getting 31 fps for copy from buffer(2nd one) … lol

aha, sorry…

truemotionblur.exe
truemotionblur_bb_copytexsub.exe
truemotionblur_copytexsubimage.exe

thats my order

but i still think its strange that BB copy is so slow… sure you dont setCurrent or do any other strange stuff?

[This message has been edited by Mazy (edited 12-02-2003).]

RTT - 120fps
pbuffer copy - 0.32fps
bb copy - 97fps

Which I find curious because RTT is slower in my code (but I had noticed sometime ago that CopyTexSubImage from a pbuffer was completely stuffed - not that that worried me as RTT should always be quicker).

[EDIT] geforce fx5900 Ultra - v52.16

[This message has been edited by rgpc (edited 12-02-2003).]

Originally posted by Mazy:
but i still think its strange that BB copy is so slow… sure you dont setCurrent or do any other strange stuff?

Mazy, would you mind test the 3 newer versions of the progs ? I’m sure that this time BB copy will be faster than pbuffer copy (equal resolutions now).

Speaking about the code, the bb version is by far the simplest, no wgl stuff, no extensions, very few gl commands. Very portable, but a bit slower than RTT on GF’s.

I tried to comment out the glCopyTexSubImage for the BB version, and I merely go from 30 fps to 42, largely fillrate limited. Even if the 12 internal renderings at 512x512 are kept the same, resizing the window to 1024x768 drop the framerate to about 24, and resizing to a very small (still visible) window gives more than 1600 fps.

I still can’t believe the GF3Ti200 is pushed to the max with 800x600x12 RGBA additive blended pixels. Only 5.76 mega pixels /s ? Did I made something wrong ?

I will try to post my GL code later, I need to clean it a bit, sorry.

Originally posted by rgpc:
RTT - 120fps
pbuffer copy - 0.32fps
bb copy - 97fps

Could you post your graphic card/system spec please ? (just a guess : GF FX 5900 ? )

I do think RTT can be faster because it does not need to actually copy texture data. As the amount of texture data is quite high (512x512x12 rgb pixels), the copy may be slower that just switching contexts 12+1 times.

[This message has been edited by ZbuffeR (edited 12-02-2003).]

Originally posted by ZbuffeR:
Could you post your graphic card/system spec please ? (just a guess : GF FX 5900 ? )

D’oh! I editted my above post by FYI yes it is a fx5900 Ultra (det 52.16)


I do think RTT can be faster because it does not need to actually copy texture data. As the amount of texture data is quite high (512x512x12 rgb pixels), the copy may be slower that just switching contexts 12+1 times.

It has been said on this board before that RTT is faster when you are dealing with large(r) volumes of data and I think yours is a case where that is so. In my case I am dealing with relatively small volumes so the context switch becomes too expensive.

One suggestion I have (assuming I read your page correctly), why not render 1 texture per frame rather than all 12. You can simply store the last 12 frames in a rotating list of textures (unless of course you are deliberately trying to stress the GPU)?

Oh and copy from pbuffer has always been slower (on NV) but prior to the 50 release (perhaps even prior to 42.xx?) it was useable, unlike what you have found (It’s almost like they copy it to system memory and back again - it’s that slow).

Pentium 4 2.4
256MB DDR 2100 RAM
INTEL 82845 Graphics Card ( Came with Dell PC)
i couldn’t get the first two programs to run on my machine, i don’t think the graphics card supports them,
render from backbuffer achieved an astounding 2 fps
these intel cards aren’t worth a @#$&%

[This message has been edited by dj_indo_420 (edited 12-06-2003).]

Bring on the crappy mainstream cards

Radeon 9200 vanilla (250/200MHz)
Athlon XP2400+
512MB DDR

RTT 41 fps
copytexsubimage 25 fps
bb_copytexsub 27 fps

amd athlon xp 1800 + gf4 ti 4600

truemotionblur_RTT.exe 66 fps
truemotionblur_bb_copytexsub.exe 55 fps
truemotionblur_copytexsubimage.exe 0.3 fps

P4 2.8 / 512 MB RAM / GeForceFX 5600 Go / 52.16 ForceWare

truemotionblur_RTT.exe 46 fps
truemotionblur_copytexsubimage.exe 0.24 fps
truemotionblur_bb_copytexsub.exe 35 fps

all testes without window resizing.

[This message has been edited by ScottManDeath (edited 12-07-2003).]

Athlon XP 2100+ + GF4 TI4400 + 512 MB 333MHz DDR + 45.23 (Forceware drivers tend to crash for me)

RTT: 61 FPS
copytexsubimage: 0.40 FPS
BB_copytexsub: 48 FPS

EDIT: A little more details

[This message has been edited by coelurus (edited 12-08-2003).]

One thing I have noticed with pbuffers is that if you intend to copy them to a texture you should not set the pbuffer as being a render to texture buffer (this attrib: WGL_BIND_TO_TEXTURE_RGBA_ARB). Simply don’t set this attrib (or the NV depth one) and I think matters should improve somewhat.

Matt

Originally posted by MattS:
(this attrib: WGL_BIND_TO_TEXTURE_RGBA_ARB). Simply don’t set this attrib (or the NV depth one) and I think matters should improve somewhat.

It sounded like a good idea, but when I don’t set it, everything is the same (sub-1fps framerates with my prog on geforce).
I wonder if NVidia is aware of that, it sounds more like a driver un-optimisation.

Ah, I was so sure that was it. There are some other attributes that I only set if I intend to render to texture. I’ll list them below (in pairs). Perhaps one of these is the problem. Try not setting any of these as well. I have noticed that I do get problems with bordered textures, which I find a pain because I use ARB_shadow.

[EDIT] To be clear these are flags when creating the pbuffer. The previous flag (WGL_BIND_TO_TEXTURE_RGBA_ARB) would be used when choosing the pixel format. [/EDIT]

WGL_TEXTURE_FORMAT_ARB;
WGL_TEXTURE_RGBA_ARB;

WGL_TEXTURE_TARGET_ARB;
WGL_TEXTURE_2D_ARB;

WGL_MIPMAP_TEXTURE_ARB;
TRUE;

WGL_DEPTH_TEXTURE_FORMAT_NV;
WGL_TEXTURE_DEPTH_COMPONENT_NV

[This message has been edited by MattS (edited 12-08-2003).]

THANKS A LOT MattS !!!
That did the trick ! Ya-hoooo ! Heee-haa !
You’re a savior man !
Hum, sorry, well, it works now.
To be more precise, simply removing the two following pbuffer parameters make glCopyTexSubImage perform at similar speed (maybe 15% slower) to render-to-texture :
(
(WGL_TEXTURE_FORMAT_ARB,WGL_TEXTURE_RGBA_ARB,
or WGL_TEXTURE_FORMAT_ARB,WGL_TEXTURE_RGB_ARB,
)
and/or
WGL_DEPTH_TEXTURE_FORMAT_NV,WGL_TEXTURE_DEPTH_COMPONENT_NV
) with :
WGL_TEXTURE_TARGET_ARB,WGL_TEXTURE_2D_ARB,

Indeed, it gives 100 times the speed (from 0.31 fps -> 32fps) !!! Woa, thank you again for bringing such a solution.

(The other parameters you mentioned did nothing WGL_MIPMAP_TEXTURE_ARB,TRUE, and for the pfd: WGL_BIND_TO_TEXTURE_RGBA_ARB )

did i understood it right ? to get optimal (pbuffer-)performance on NVIDIA cards, i should avoid using NVIDIA-specific-extensions/features ???

No, I believe that ZbuffeR & MattS are trying to say that if you don’t want to use Render To Texture, don’t set the RTT attributes when creating the pbuffer.

I just implemented this in my code and it works as stated. Nice one.

with geforce ti 4600 :

RTT : 67 fps
BB : 52 fps
Copy pbuffer : 0.23 fps

Sorry equentric, I have just updated my webpage, would you mind test the new prog ?
It should work as expected now, I would just like some more Radeon tests.

Glad to be of help. I’ve found lots of solutions on this forum so it’s nice to provide a solution for once.

I’ve never raised this with nVidia because I wasn’t sure whether it would be considered a bug or not. Perhaps they would like to comment…

I’m currently developing on a Radeon 9800 128MB(non-Pro) with Cat 3.9 (not hot fix) on a P4 2.8 with 512 MB memory and these are my results from your new builds.

bb_ctsi 87
ctsi 15
pbuffer_ctsi 19
rtt 105

pressing ‘B’ roughly doubles the speed on the two high frame rates but has very little impact on the low frame rate ones.

I can’t explain why the results are so bad for two of the ctsi, esp. considering that other Radeon owners do not have the same problem. As mentioned earlier I have problems with bordered textures and pbuffers so possibly that is it. It may be a driver issue I suppose. Any advice would be gratefully received.

Matt