GPU ray casting speed


I have begin to work with GPU raycasting and this seem really very very slow :frowning:

I have adapted the GPU raycasting tutorial found at
for the Linux platform and it work only at 0.02 fps on my Linux box :frowning:

I think that it is certainly the

for(int i = 0; i < 450; i++)

loop in the fragment shader into the that cause this :slight_smile:

=> where can I find a very fast GPU raycasting algo/tutorial on the Net ???

The fragment shader loop is effectively one problem of the slowdown

With this little modification into the fragment shader, I have now 0.1 fps

for(int i=0;i<10;i++)

(but the result is very bad of course, only a purple quad with a black hole in the top right …)

So the real time (cf. 50 fps or more) is really very very far :frowning:

Please, say me that it exit a very fast GPU raycasting algorithm that can work in real time and that I can found on the Net :slight_smile:

Please, say me that it exit a very fast GPU raycasting algorithm that can work in real time and that I can found on the Net

You have a GeForce 7100 GS, which is over 5 years old, and it was low-end for its day. It has precisely 3 vertex shaders and 4 fragment shaders. My embedded GPU (HD 3300) has 40 shaders (unified) and it’s clocked over twice as fast as your card.

There is no reason you should expect your hardware to be anything even remotely “real-time” with this algorithm.

There are two appraoches to GPU raycasting

  1. Multipass approach by Kruger and Westermann. this is the appraoch Peter trier used in the tutorial.
  2. A single pass appraoch by Strengert et. al

Either approach u use u need to traverse the whole volume dataset and this is what slows down the method. There are several optimizations like empty space skipping/early ray termination etc. U can lookup the course notes of the following course
which lists almost all of the important techniques.

In addition if u need more help, just drop a line here. I will see if I could hlp.

Thanks Mobeen for the links,

Alfonse, it’s clear that I have to buy a more recent graphic card in a near future :slight_smile:
(but on other side, with an “old” graphic card I have more luck that this can too work not too slowly on a eeepc, an iPhone or a PocketPC for example, but me too I think that it is only a dream :slight_smile: )

What I think is to precompute something like “3D integrales” into a cube map and use values at the nearest and farest planes for to compute the color/transparency of the “ray that is cast”.

Or only something like a “cube bump map” for to begin (cf. only the surface is handle, not the interior)
(for example a sphere can be represented by a cube that use a “dome bump map” on the 6 faces)
=> but this can certainly only handle convexes objects :frowning:

Heu … my idea seem more or less similar to something like the hemicube radiosity method.

So nothing to do with the ray casting that permit to handle the transparency into the 3D volume, not only the lighting on the surface :frowning:

But no problem, I have a lot of ideas in reserve :slight_smile:
=> can GPU units to be configured for to compute in // the color/transparency along a path/curve instead along a column or a line ?

I have been looking for the high quality interactive volume rendering on GPU as well, so far nothing to match the interactive rendering provided by the best CPU volume ray casting (within similar price range). GPU scales badly with increase of sampling density plus data-set-size slows rendering speed dramatically (like it depends linearly on number of voxels). Big view ports is not a problems as soon as interactive rendering quality is not an issue - it runs fast only if quality is really bad.

I have tested a lot of GPU Volume Rendering engines (mostly proprietary) yet public available implementations listed below exhibit very similar quality/speed balance.

  1. ImageVis3D
  2. Voreen

To illustrate the speed quality difference CPU/GPU VR let consider the following setup:

  • 512x512x2300 (13 bits) Run-off CT data-set,
  • Transfer Function is set to have:
    1. opaque reddish-white bone with well defined ~red
      brightened vessels (no segmentation involved),
    2. semitransparent “film” like skin with Phong lit upon,
  • View port size ~1080p,

To have a high rendering quality for such TF setup, would require a very high sampling density along ray (x8…x16 samples per cell (IC case)). Once I match the ~“artifact free” qualities for GPU and CPU engines the speed difference is at least x5 times in favor of CPU (very conservative) while the hardware prices are around the same for CPU and GPU setups: (980x+580gtx/3GB) ~= (dual E5649).

My experience apparently differs very mach from mainstream mindset regrading VR on CPU / GPU; yet I think it may be interesting to be shared.


Hi Stefan,
I would like to do the comparison on my GPU raycaster for performance/quality. I want to know more about the test settings u have done and what were the performance stats that u got. In addition, which dataset did u use is it publicly available? Could u pm me these? We could do a collaborative quality performance comparison btw CPU and GPU if u are willing to do so.


You are definitely welcome ;o) If you are a researcher from university you may obtain the research license from Fovia at:

Just state the purpose/goal of you research something like “comparison of CPU vs GPU VR”.

Regarding data-sets: I used to use for bench-marking following data-sets:
750x512x512 (12 bit) skull
2300x512x512 (13 bit) full body run off
3000x512x512 & 4000x512x512 (13bits) crocodile mummies
I’m sure these data can be used for “gpu vs cpu” bench-marking; once you obtain the research license the data can be provided.
The data-sets from Osirix data depository are mostly a low noise champion data sets still some of them (biggest) may be used for bench-marking as well.

>I want to know more about the test settings
> u have done and what were the performance
> stats that u got.

I’ve described one of TF / hardware setup above so I may clarify upon if you may need some more specificity.

Anyway, the accurate and complete bench-marking report can not be described in short post since there are many different scenarios so once you finish the bench-marking you may have a complete picture, I would love to be involved, once you obtain the research license I may assist.


Hi Stefan,
Actually, I did request FOVIA on 28th of October 2010 and i specifically mentioned that I wanted to see Fovia in action and compare it to GPU, however, the director of Business Development (Shay Kilby) refused to provide me the license. And I am a researcher by the way. I could pm you the email that I received from Shay Kilby if u want.

EDIT: Anyways, I have put up a license request once again I hope it is not declined now.

Should be some reason of declining such as: you do not have a sufficient hardware or you associate with commercial organisation or etc…

HDVR gains its advantage once we move to higher-image-quality/bigger-data-sets/bigger-view-ports therefore the sufficient CPU resource is absolutely critical to appreciate the power of HDVR. The CPU+GPU should be compared with CPU+CPU. For example 980x+580GTX/3GB vs. dual E5649…

>I could pm you the email

Please do so at:
stefanbanev at yahoo dot com
also, no need to expose “non-rendering” content here pls use email.


Hi Stefan,
Check your email.