More GPU Raycasting Questions

I you are trying to optimize, you should take a look at the tools offered by AMD and NVidia.

Nvidia:
ShaderPerf can be used to compare your algorithm on different GPUs virtually
NVemulate has an option to output the shader assembly code which can help you understand how the compiler works.

AMD:
GPU ShaderAnalyzer does the same thing as both NVidia apps above.

Ah, they look handy. Unfortunately I’m Mac-based, and they’re all Windows applications :frowning:

a|x

Using the ShaderAnalyzer, I see that my “optimizations” don’t affect the throughput on the HD range of cards. This is partly because the compiler is smart, and that the sin() call is scalar and not vectorized (so it ends up calling sin() three times regardless). So now we know that :slight_smile:

Ah well. Thanks for checking though. They might be faster on my GPU/Driver though, so I’ll still give them a try.

I’ve been thinking about other ways to optimise, by cutting down the number of ray steps. How does this sound, as an idea (can’t get to my laptop to try it right now, unfortunately):

• Start casting ray with large stepsize (say 8x the fine setting)
• If ray accumulates opacity of more than 0, jump back one step along ‘course ray’, then cast ray again with finer stepsize

This should cut down drastically on the cost of rendering rays that go all the way through the volume without hitting anything, and also reduce somewhat the rendering load for rays that don’t hit anything until they’re part-way through the volume.

Do you think it’s worth attempting to render half the volume front-to-back, and the other half back-to-front, employing the same approach, or is the added complexity involved likely to outweigh any performance gains from potentially decreasing the number of per-pixel ray steps?

a|x

This sound like a good optimization for ‘sparse’ volumes.
But the silhouette may appear slightly jagged.

An adaptive sampling is often used in cpu raytrace renderers : if sampled value changes between 2 pixels above some defined threshold, more inbetween samples are performed recursively, until value change goes below threshold or max subsampling is reached.

But I have no idea if such a dynamic sampling is efficient on GPU, as branching cost may override the sampling optimization.

To be tested :slight_smile:

Hi ZbuffeR,

when you say ‘pixels’ do you mean ray-steps, maybe?
I’m also wondering if the added complexity might overide any performance benefits. Hmmm…

a|x

In my example I meant ‘pixel’, as in the context of classic CPU raytracing, the ray hit tests are (almost) always done on octree or other hierarchical structure.
But the adaptative pixel sampling is more recent.

With GPU rendering, it is very hard to have control on adaptative sreen-space pixel sampling, at least I don’t see to do it efficiently.
But both ray-steps and pixel sampling could benefit from adaptive sub sampling, if both are doable :slight_smile:

Hi again ZbuffeR,

do you have any references on ray-step adaptive sub-sampling on the GPU, by any chance? I’m practically limited to a single-pass shader (I can’t easily cascade several shaders one after the other). I’ve seen descriptions of techniques that allow rays to skip empty regions, but these seem to work on static volume textures by pre-processing them before the volume is rendered. Obviously, with dynamic surface rendering, this isn’t an option.

a|x

Perhaps you could do your own “two-pass” technique. First use a somewhat large step, and find the last empty position before you hit the surface. Also find the first empty position on the back, which is empty all the way to the edge. Ie calculate the bounding positions.

Then in step two, do your regular thing (with a shorter step length), but start at the position you found above, and terminating when you go past the other bounding position.

Since you may miss the surface in the first step, you could perhaps lower the isosurface threshold in the first pass a bit.

This way you might get away with a lower total number of steps. Anyway, just an idea.

Hi again Lord crc,

I will give that a go!
I think what I will do is have the option to switch adaptive sampling on/off. It may be that with less ‘sparse’ volumes, it actually increases the number of steps, while with less ‘busy’ volumes it should decrease them significantly.

Thanks for the advice. I will let you know how I get on!

a|x

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.