Laying down depth information

this is a simplified version no,DOF,glow etc just draw scene + then redraw depth X number times afterward (clearing first)

:o
whoops looks like my FBO version was using a fp16 texture by default hence a large slice of performance drop, also another possible cause for the slow down WRT linear depth is the normal zdepth has to be finished first (thus depth texture exists) before u can create the linear depth, thus that order is different that the redraw mesh method. I believe binding FBOs is the most expensive operation u can do in opengl.

240meshes in frustum, no occlusion test RGBA8
redraw all meshes, blit depth tex, number lay down depth iterations either by redrawing all meshes or blitting.
36.6 48.4 10
46.6 53.5 5
55.4 56.5 2
59.2 58.0 1

115meshes in viewfrustum
48.4 54.3 10
59.5 60.6 5
68.7 65.1 2
72.7 66.4 1

ok faced with this data, theres not really much in it for a standard scene (which would be laying down depth perhaps 2-3x extra)
sorry about the panic :cool:

Awesome, zed. Makes me want to get back into game programming again.

ta Brolingstanz
actually thinking about this some more, Ive still got the feeling I was right with my initial assertion, its hard to test in a game with a lot of extra stuff going on which can influence the result.

Ill try to code up a simple demo tonight

http://www.zedzeek.com/overdraw_tester.7z
1,2,3 switch between the buffers (1,3 are most valid)
-= for adding rocks
[] for number of extra times to draw the depth info

summary, for 99% of cases redrawing the whole scene is gonna be quicker

I couldnt get glBlitFramebufferEXT working for the depth (I’m pretty sure ive had it working in the past, nvidia drivers would crash as apps exit if I tried, no errors reported)

If I come to think about this if you store normal and depth you can do the lighting individually as well without redrawing the geometry again and again. But you have to use the deferred tehnique for this I guess. Anyway nice performance :slight_smile:
I tryed your demo but it’s capped at 60 fps. Why is that?

vertical sync; try to disable in driver settings…

Just tried the test program on my laptop (9800M gtx). 2 seems the fastest overall. 3 wins with very low depth complexity which is what I’d expect, but it takes a lot of depth complexity for 1 to beat 3.

What exactly does 2 do?

And yeah, I had to force vsync off in the driver CPL.

GTX285, 100 rocks, view not modified, default driver settings except vsync forced off:

1: FPS ~1600 ~69MVerts/sec ~90MTris/sec
2: FPS ~2000 ~78MVerts/sec ~120MTris/sec
3: FPS ~2000 ~155MVerts/sec ~240MTris/sec

CatDog

verison 2 doesnt do much, I was just testing to see if theres a speed difference between a FBO with depth texture2d vs a FBO with depthrenderbuffer.
btw no difference on my gf9500.

in a typical game scene, theres how many objects onscreen?
perhaps ~100
In this senerio if u need to write the depth again to the buffer then youre best off redrawing the whole scene again (at least on nvidia cards).

@CatDog - your cards to fast, whilst you can change the window size the FBO does not change as well.
question - how would I do this best?
I could destroy + recreate the FBO + corresponding textures but when the user is changing the windows size by dragging the borders this seems like overkill, is it the only way?


btw heres a newer version (with vsync off :wink: also I do culling of rocks outside the viewfrustum + ground plane gets drawn into the buffer)

http://www.zedzeek.com/overdraw_tester.7z
1,2,3 switch between the buffers (1,3 are most valid)
-= for adding rocks
[] for number of extra times to draw the depth info
left + right mouse buttons to change camera

This hasn’t much to do with objects, it’s mainly fill. You are either drawing a full view quad (depth complexity = 1) or a few objects with a presumably slightly lower fill cost. Until the overall cost (likely when depth complexity > 1) of the objects exceeds that of the quad you’re probably going to get better performance from the redraw. Any relatively complex scene (i.e. with vegetation) taking up at least most of the view is bound to perform better with stored depth.

Your tester’s results are just what I’d expect, nothing unusual about the results as far as I can tell.

CatDog’s perf sounds right, even my laptop did ~1200 fps.

Edit: About resizing, I’ve never seen any adverse effects from just redefining the tex attachment dimensions and changing the glViewport parameters, why destroy the FBO?

Ok, another one using the new version and increased load.

GTX285, rocks=1000 depth-passes=1

1: FPS ~515 ~200MVerts/sec ~300MTris/sec
2: FPS ~515 ~200MVerts/sec ~300MTris/sec
3: FPS ~335 ~255MVerts/sec ~394MTris/sec

GTX285, rocks=1000 depth-passes=20

1: FPS ~325 ~125MVerts/sec ~190MTris/sec
2: FPS ~515 ~200MVerts/sec ~300MTris/sec
3: FPS <u>~33</u> ~270MVerts/sec ~410MTris/sec

GTX285, rocks=1000 depth-passes=100

1: FPS ~119 ~45MVerts/sec ~70MTris/sec
2: FPS ~515 ~200MVerts/sec ~300MTris/sec
3: FPS <u>uh…~7</u> ~270MVerts/sec ~410MTris/sec

It’s also interesting to watch the Process Explorer. Mode 2 seems to have much less CPU load! In mode 3, the app get’s jerky when depth-passes are increased.

edit
And a last one, out of curiosity:

GTX285, rocks=5000 depth-passes=100

1: FPS ~72 ~137MVerts/sec ~210MTris/sec
2: FPS ~138 ~263MVerts/sec ~405MTris/sec
3: FPS <1, screwed

CatDog

coincidentally, vegetaion is what Im doing today,
btw the camera can be moved so its pointing downwards, thus 100% screen has depth info

Edit: About resizing, I’ve never seen any adverse effects from just redefining the tex attachment dimensions and changing the glViewport parameters, why destroy the FBO?
yes youre right, I realized afterwards theres no need to recreate the FBO

cheers all, for the info