OT: fastest ray-triangle line -triangle intersection algo

I tried out the Magic software intersection codes. I made some minor changes to make them run faster.

I benchmarked at 93,000 tests per second in some cicumstances. In some cases, my own code runs faster (6X -> 550,000) and sometimes much slower.

93000 is a pretty low number isnt it? Anyone know of a good source that can reach the millions and even hundreds of millions?
I will take it and write an SSE version if I have too.

Thanks a plenty,

In my ray triangle intersection I first reduce the problem to 2D.

  1. Calculate Intersection of ray with the plane in which the triangle lies.
  2. Take the highest coordinate of the normal vector of the plane. This one is dropped in the triangle points and the intersection point. Now you have a 2D problem.
  3. You have P1, P2, P3 and X now. Calculate the area of A1 = (p1,p2,p3) and the area of A2 = (p1,p2,x)+(p2,p3,x)+(p3,p1,x). If a2 > a1 then x is outside the triangle.

I don’t know how fast it is compared to other algo’s. Which algorithms do you use?


Could try the references in this paper: http://graphics.cs.uni-sb.de/Publications/2001/InteractiveRenderingWithCoherentRa yTracing.pdf

Dodgy c&p:

Bary.   Pluecker  Bary.    speedup
C code  SSE       SSE 4-1

min 78 77 22 3.5
max 148 123 41 3.7

Table 1: Cost (in CPU cycles) for the different intersection algorithms. 41 cycles correspond to roughly 20 million in-tersections per second on a 800 MHz Pentium-III. Measured by using the internal Pentium-III CPU counters. In our implementation, the SSE code for intersecting four rays with a single triangle requires 86-163 CPU cycles. Amortizing this cost over the four rays results in only 22 to 41 cycles per intersection, which corresponds to roughly 20 to 36 million ray-triangle intersection test per second.

EDIT: And here’s the Barycentric algorithm they’re using.

[This message has been edited by Maj (edited 08-21-2002).]

Originally posted by Kilam Malik:
[b]In my ray triangle intersection I first reduce the problem to 2D.

  1. Calculate Intersection of ray with the plane in which the triangle lies.
  2. Take the highest coordinate of the normal vector of the plane. This one is dropped in the triangle points and the intersection point. Now you have a 2D problem.
  3. You have P1, P2, P3 and X now. Calculate the area of A1 = (p1,p2,p3) and the area of A2 = (p1,p2,x)+(p2,p3,x)+(p3,p1,x). If a2 > a1 then x is outside the triangle.

I don’t know how fast it is compared to other algo’s. Which algorithms do you use?


My idea’s have revolved around tricks like this. Calculating the areas of each of those triangles will make you pay some cycles. I used the dot product approach. Too bad Intel doesn’t have a dot product instruction to make life simpler.

Maj, sounds like nice pages there. I will have to read them in detail and see if they give me ideas.

PS: My own algorithm is reaching 558000 now, but not good enough since in some cases, it goes way lower.


There was a disscusion like this at gamedev.net in the math forum. I tried all their methods and I found one that was the fastest. I don’t know if it’s THE fastest, but it’s fast for me

inline unsigned int s3dVecInTriangle3f2(S3Dvecfv p, S3Dvecfv v0, S3Dvecfv v1, S3Dvecfv v2) 
	S3Dvec3f	e10,e20;
	S3Dvec3f vp;
	float		a, b, c, d, e, x, y, ac_bb;

	s3dVecSubtract3f(e10, v1, v0);
	s3dVecSubtract3f(e20, v2, v0);
	a = s3dVecDot3f(e10,e10);
	b = s3dVecDot3f(e10,e20);
	c = s3dVecDot3f(e20,e20);

	ac_bb = a*c - b*b;

	s3dVecSubtract3f(vp, p, v0);

	d = s3dVecDot3f(vp,e10);
	e = s3dVecDot3f(vp,e20);

	//x = d*c - e*b;
	//e = e*a + d*b;

	x = d*c - e*b;
	if ( x < 0 )
		return false;

	y = e*a - d*b;
	if ( y < 0 )
		return false;

	return (x+y) <= ac_bb;

a famous example http://www.acm.org/jgt/papers/MollerTrumbore97/
beware though this code (+ most others out there is flawed, ie it will fail sometimes when intersection falls on the triangless border (yes i know triangles strictly dont have a border as its infinitly thin)


but its a pain when u have a quad made up of 2 tris and a ray passes through both (no intersection) cause its on the diagonal

btw some links to pt in polygon routines (i believe only one handles the point on border cases though ) http://astronomy.swin.edu.au/pbourke/geometry/insidepoly/ http://geometryalgorithms.com/Archive/algorithm_0103/algorithm_0103.htm http://www.dfanning.com/tips/point_in_polygon.html http://vbdesign.hypermart.net/cadpages/point_in_polygon.htm http://home.earthlink.net/~bobstein/inpoly/ http://www.whisqu.se/per/docs/math26.htm http://www.cg.tuwien.ac.at/courses/AlgoCompCart/Lecture05/plumbline.html http://www.acm.org/tog/resources/RTNews/html/rtnv5n3.html http://condor.informatik.uni-oldenburg.de/~stueker/graphic/index.html http://www.acm.org/tog/editors/erich/ptinpoly/

if you only want to intersect triangles with rays, say you want for example fast raytracing, i suggest that you convert all your rays to plueckerspace (needs another 3 coordinates to the existing ray, or simply a 6d vector (plueckerspace is homogene 5d…)

your triangles as well get some additional data, namely they get the 3 rays surrounding it, the edges. those, as well, in plueckerspace. the final intersectiontest is simply 3 6d dotproducts, and the comparison, if all are bigger than zero (or below, depending on your data setup…).

its one of the fastest, but uses more precalculated data. it depends, but for a raytracer its great (and you can trace all sort of convex polygons, so quads as well for example…)
http://www.flipcode.com/pluecker oder http://www.flipcode.com/tutorials/pluecker i dont remember… good luck

You probably know more about your triangles than just that they’re 500,000 arbitrary triangles.

If, for example, they don’t move between tests (but the ray does) then you can put them in a spatial index, such as a kd-tree. That will let you cull away a lot of non-colliding triangles very fast.

If they move, but they move in groups – such as meshes in a game, for example – you can treat each group with a bounding volume (such as a sphere) and quickly cull away large groups, only to test those whose bounding volume intersect the ray.

If you generate these in software constantly, so you really don’t know anything about them a priori, but you know the ray as you generate them, you can run the test as part of triangle generation. That’s likely to cache much better.

OK, to begin with, I tried Whatever’s code (and it resembles Magic Softwares code but is super short!)

Just over 1 million tests/ sec and Maj’s link gives an even better one -> 3.7 million I think?
I have come up with an even better one, but unfortunatly I’m having some trouble. Can anyone tell me if this makes sense :

You compute the point of intersection as usual.
Then you do

vector1= vertex1 - POI
vector2= vertex2 - POI
vector3= vertex3 - POI

If (point inside triangle, then none of the 3 components are all positive or all negative)
if there is
return 0
return 1

It has the potential of being blinking fast but… I’m not sure but could my problem be precision error. I have no idea why it isn’t working. Works on paper!


Ah! mistake in my logic. That test is not enough.


I’m sure it’s the same code you had V-man. When I obtained my point in triangle code from gamedev it did some extra calculations like obtaining the uv position of the point…it was a raytracing algo so that’s understandable. I just removed all that stuff from the code because we don’t need that for simple collision detection .

Would you mind posting your new code that you got from Majs’ link? That seudo code makes no sense to me . I want to try it out and see the performance difference.

I don’t know pseudo code either, but my first conversion to C++ worked

float u0, v0, u1, v1, u2, v2, beta, alpha;
long i1, i2;
float floaterror=1e-5;

{	//Don't bother checking if they intersect (to simplify the algorithm)
	return 0;
//(x, y, z)=(x, y, z)+t*(x, y, z)
t[0]=(normal3D[0]*(triangle3D[0]-startingPointOfLine[0])+normal3D[1]*(triangle3D[1]-startingPointOfLine[1])+normal3D[2]*(triangle3D[2]-startingPointOfLine[2]))/(normal3D[0]*lineVec  tor[0]+normal3D[1]*lineVector[1]+normal3D[2]*lineVector[2]);



return (alpha&gt;=0.0)&&(beta&gt;=0.0)&&(alpha+beta&lt;=1.0);

/edit/ minor fix

[This message has been edited by V-man (edited 08-25-2002).]

Thank you, it worked!

I tested it on 2304 triangles and I went from 188 fps to 198 fps(it was rendering a scene with 4765 tris with a few effects enabled).

[This message has been edited by WhatEver (edited 08-25-2002).]

that code calculates twice the dotproduct between normal3D and lineVector. you should get it even faster…

struct Triangle {
v3 a,b,c;
Triangle(v3 a,v3 b,v3 c) : a(a),b(b),c© {
bool Hit(f32 start[3],f32 end[3]) {
f32 _ray[6];
_ray[0] = end[0] - start[0];
_ray[1] = end[1] - start[1];
_ray[2] = end[2] - start[2];
_ray[3] = start[1]end[2] - end[1]start[2];
_ray[4] = start[2]end[0] - end[2]start[0];
_ray[5] = start[0]end[1] - end[0]start[1];
f32 _e01[6];
f32 _e12[6];
f32 _e20[6];
_e01[0] = b.x - a.x;
_e01[1] = b.y - a.y;
_e01[2] = b.z - a.z;
_e01[3] = a.y
b.z - b.y
_e01[4] = a.z
b.x - b.z
_e01[5] = a.x
b.y - b.x
f32 pd0 = _e01[0]_ray[3] + _e01[1]_ray[4] + _e01[2]_ray[5] + _e01[3]_ray[0] + _e01[4]_ray[1] + _e01[5]_ray[2];
_e12[0] = c.x - b.x;
_e12[1] = c.y - b.y;
_e12[2] = c.z - b.z;
_e12[3] = b.yc.z - c.yb.z;
_e12[4] = b.zc.x - c.zb.x;
_e12[5] = b.xc.y - c.xb.y;
f32 pd1 = _e12[0]_ray[3] + _e12[1]_ray[4] + _e12[2]_ray[5] + _e12[3]_ray[0] + _e12[4]_ray[1] + _e12[5]_ray[2];
_e20[0] = a.x - c.x;
_e20[1] = a.y - c.y;
_e20[2] = a.z - c.z;
_e20[3] = c.ya.z - a.yc.z;
_e20[4] = c.za.x - a.zc.x;
_e20[5] = c.xa.y - a.xc.y;
f32 pd2 = _e20[0]_ray[3] + _e20[1]_ray[4] + _e20[2]_ray[5] + _e20[3]_ray[0] + _e20[4]_ray[1] + _e20[5]_ray[2];
return pd0 < 0 && pd1 < 0 && pd2 < 0;
void Draw() {
glBegin(GL_TRIANGLES); {
} glEnd();

works. based on plueckerspace, but without any precalculation of data. and this can be the fastest one of all, if you would optimize it for example with sse to calculate all 3 edges at the same time… IFF you would do it for a raytracer you would precalc most of this (the 3 pluecker edges and the pluecker space ray) and you could gain even much more…

oh, and the 3 ifs in the end are not needed with sse as well

oK, gone have to spend some time learning plueker space technic then.

Are you working on a raytracer dave?
I made one this summer (finally) but its dog slow. Smarter technics are very much necessary like partitioning space, bounding box, better representation of data, …

I had to make an AVI movie to see things as it got rotated. 2:30 hours of computer time!


yep, i’ve done some tiny raytracers yet, but nowhere as fast as naturesuxx. what have you traced? with some spheres and planes, you get (incl shadows and reflections) normally some fps, if you don’t mess something really much up… currently a friend of me works at a speed optimized math library for the aegis engine, wich provides stuff like squareroots in 3 cycles on a p3… can’t wait to get that stuff done, then it should get much faster…

and, as i said (i think…), if you percalc the plueckerspace coords for the triangles, and for the rays, you get the intersection much faster. (only the dotproducts in the end…) its about the fastest check. no divisions, no squareroots, just some mads… would love to get hw dotproduct and hw crossproduct, oh well

I saw the naturesux page and they had a pretty advanced software. I think they are using direct3d to do some of the rendering. I even saw another page that started out talking about how raytrace would eventually be a faster solution to scanline (in the case of extremely high LOD) but then slowly went to suggest using opengl to aid rendering.

The problem with my raytracer is that it’s polygonal. I’ve used my opengl engines design to design this thing, so spheres are made with polygons.
But anyway, in the real world, there aren’t much objects made with sphere cylinders and cubes.


there aren’t much made with polygons eighter… nor spines nor anything… they are a) volumetric (think about it, 3d geometry is in fact flat… proven by the new siggraph paper wich demonstrates unwrapping a mesh onto a 2d texture )
and b) they are made of particles…


i just love raytracers… well… and the spheres are useful anyways… speedup by 300% if you have boundingspheres around each cuple of triangles…

i’ve rewritten my intersection test a bit, now its even more visible how good sse would fit into it… very low amount of multiplications and additions… have to set up some code and test it, possibly at the weekend, we’ll see…

and yeah, a “normally good” raytracer scales about logarithmically with complexity… its basically the same as in the unreal engine (the software version). it doesn’t mather how big and how complex and what ever the level is. only what is visible mathers, and even that is not that important. in the end, more or less unreal only cares about the amount of pixels on screen (+the characters…). portals with beamtree are about the way raytracers work…its cool

and yes, the first hit, the one with the camera - rays can be done with rastericers, thats really a simple but effective optimization…

and yes, opengl can help raytracing as well, can’t wait for my r300… first test done directly with rastericers, then generate the rays there per pixel and rerender all the triangles, but fullscreen (i will render on 320x240…), and instead of a triangle simply the components get sent into the pixelshader, and you render a fullscreenquad. that way you do all the ray-triangle intersection in the pixelshader…

thats about the basic idea, i think shadows could be done quite good, and possibly first reflections as well… we’ll see how much can be done yet… naturesuxx roxx… its awesome (and it is quite general useable, not like heaven7 wich fakes about everything it can (but its awesome anyways))

they even plan to code a game with naturesuxx, can’t wait for that one… (oh, and a new cpu )

OK, went over the plucker coordinates. Nice to see an idea from physics in there.
The problem is a copy has to be kept of these coordinates and everything has to be pre transformed before generating the plucker.

Did you use these and benchmarked? I may try them another time. Put them on the web if you have.

let’s do volumize? You do it! A wise man once told me “keep it simple you stupid idiot!”

This thing takes needs way too much cpu power. Using tricks in opengl for games is the only option and will be for a very long time, sadly.