new Relief Mapping shader (better Parallax Mapping?!?)

Bumpmapping, parallax mapping, etc… all have one thing in common. They only require a simple quad as input to the vertex units. The impression of relief is created by the fragment program.

To get some kind of speed comparison, I made a program that creates a vertex buffer object with associated index buffer from a 512x512 depthmap.

On a GF6800 GT this textured mesh is drawn at ~180 fps (1600x1200). I did not include lightning in the fragment program yet. The vertex program seems to be the bottleneck.

The relief mapping techniques do have the advantage that they require a smaller amount of memory on the graphics card.

Nice demo! but like Sirknight mentioned, there are some artifacts. But I believe most of them are inherent to using relief mapping techniques.

Nico

Using double precision did not help for me. I tried all combinations of settings and the same artifacts were still present. But this is a good start none the less.

-SirKnight

I just tried out your newest version and the performance is a bit lower. Even using NVFP30 which did give a boost compared to ARBFP1. Also I can’t select NVFP40 mode even though I have an NV40 card.

-SirKnight

I have disabled the nv40 option in that post as I was only testing the arbfp1/nvfp30 thing there. I still would like to know it it can run on ATI X800 card…

I have a NV40 version that does early exit on loop at first intersection (should save several texture reads). But looks like overhead for using the true loop instruction is bigger than the saved reads. Also if one fragment exits early, it still has to wait for all other fragments running in parallel so you always get the time for the slowest fragment in batch.

I will enable the NV40 option again now… it runs slower than the version using all unrolled loops and if conditional stores but is good to testing PS 3.0 features in Cg.

ok, NV40 enabled version uploaded. same url for zip file as before.

but selecting the NV40 option will make things slower. I’m using the following compiler options:

-profile fp40 -DRM_NV40 -ifcvt none -unroll none

Extra precision helped a lot on my system with the edge definition of the texture implied geometry.

Thats some nice stuff fpo, I’ll enjoy playing with it.

I get 28fps on a 5950 Ultra using the NV30 path and 14fps using the ARP path.

Nice work.

EDIT: I just realized I had v-sync enabled w/o its 37fps and 20fps with default settings.

Awesome. It would be nice if you could toggle between Parallax and Relief in the demo.

I added Parallax Mapping. Replace the shader with this one , choose standard bump mapping and enable shadows (enabling shadows will enable parallax mapping :slight_smile: ).

Great Sunray… loved you parallax mapping addition. We can now compare bump, parallax and relief. Excelent work!

I have incorporated your parallax mapping code into my demo and added a new menu option for it. I also made all source code available this time so you can just load the project in VC++ and re-compile the executable (all lib/h files needed for the build should be there now).

Make sure to read the included PDF file that explains how it works and why we have a few artifacts at the edges depending on depth map detail.

First of all: Very nice Demo, fpo!

But these artifacts when view almost parallel are really ugly. Why don’t decrease the linear search steps depending on the angle from the view-vector to the surface? Parallax mapping doesn’t look as good as relief mapping but doesn’t have any artifacts like this.

Thanks Corrail, having different number os linear search steps based on angle with relief surface is a good idea.

But we can not have variable loop steps for every fragment (not even with PS 3.0 I think).

With PS 2.0 the number of steps must be known at compile time so we would need to recompile the shader every frame the view angle changes.

With PS 3.0 we can have the number of loop steps passed in as a parameter (constant for all fragments of a polygon). But the best way would be to compute the number loop steps per fragment depending on fragment angle (not polygon angle)… but this is not possible yet.

It is difficult to produce the correct depth with high quality and speed. This is the best I could think about (tried other forms of releif maps/displace maps but all much slower and with much worse artifacts).

When you use PS3.0 the maximum passes of a loop is deteremined per Drawcall (Thanks for this info to Demirug @ 3dcenter.de). So for your project you can use a for loop with a fixed number of passes. Depending on the angle between the view vector and the surface you can use break to jump out of the loop earlier.
So this is possbile using PS3.0. IMHO you can do that with PS2.0 too.

Running this on my X800 Pro (with cats 4.8) I get a blank black square (990 fps too) running in relief arbfp1 or the nvfp30 path. Normal and parallax works fine.

Hi Fabio. Congratulations with the ShaderTech compo!

I recently experimented in this area having come across the following Siggraph sketch a little while before the conference:

Displacement Mapping with Ray-casting in Hardware

This describes an improvement upon naive heightfield ray marching. As I couldn’t get the accompanying demo to run - in the same directory as the pdf on last inspection - I thought I’d implement it myself and try out some other ideas for extending basic parallax mapping.

One thing to bear in mind with binary searching is that you won’t necessarily zone in on the first hit. How critical this is depends, of course, on the height field in question and it may be possible to work around - I’ve not had time to look at your implementation to see if you’re doing anything clever in that respect. In practice I didn’t find that to be much of a problem in my limited testing, in fact the results were asthetically better than incremental marching (with more steps) for the test map I was using.

With regard to artefacts at shallow view angles, I found that simply scaling the displacement based on V.N was enough. Of course this isn’t at all correct physically but it again looked good and was cheap to add – the sort of ‘solution’ (read: hack) game devs like :slight_smile: .

This could work well in conjunction with varying the number of samples in the same way. It may also be possible to vary the sampling per-pixel with ps3.0 hardware as that model supports a loop break, although I believe you’re then restricted to using manual lod texture lookup. If it’s indeed implementable it probably won’t be fast and incoherency could also affect the speed.

Finally, concerning curved surfaces, you could look at the following paper if you haven’t already, which discusses that situation and a lot more besides:

Generalized Displacement Mapping

Thanks WarnK for testing it on X800… thought it might run. Maybe too many texture reads and too long shader. Thanks anyway.

Hi SK … thanks for the excellent links and suggestions. I was also at siggraph this year but did not see that sketch.

Here is another interesting pixel based displace map that I found on the web… different approach to curved surfaces:
http://www.gris.uni-tuebingen.de/publics/paper/Hirche-2004-Hardware.pdf

The Generalized Displacement Maps looks awesome (would like to see a running demo for that) but can only apply to a small tile surface. In Parallax and Relief can use much larger textures with much more different detail in it.

A binary search alone is good but will be wrong in cases where the ray intersects more than one point in depth map (see fig 7 in my pdf file). If there are several intersections with map for a given ray, the binary search can return any of them… so we must find a local (closest) point inside object and then search for a intersection around that region (for the closest intersection).

That is why we have the artifacts… is some section of the depth map on an intersection is smaller (thinner) than the linear search step size we might miss it and end up with next intersection instead of first (that gives the cutting off artifact where we see through first intersection into back of object shown here by SirKnight).

Originally posted by fpo:
Here is another interesting pixel based displace map that I found on the web… different approach to curved surfaces:
http://www.gris.uni-tuebingen.de/publics/paper/Hirche-2004-Hardware.pdf

Ah yes, I tried to dig up that paper for my previous post but I’d forgotten the conference and title! I haven’t actually read it in full yet either, so I probably would have ended up mischaracterising it.

Originally posted by fpo:
The Generalized Displacement Maps looks awesome (would like to see a running demo for that) but can only apply to a small tile surface. In Parallax and Relief can use much larger textures with much more different detail in it.
Indeed, it would be great to see it in action, although I agree that the limited texture size makes it impractical for a lot of applications. There may be some mileage with texture synthesis in the future as the authors suggest and it’s an interesting paper nonetheless.

Originally posted by fpo:
A binary search alone is good but will be wrong in cases where the ray intersects more than one point in depth map (see fig 7 in my pdf file). If there are several intersections with map for a given ray, the binary search can return any of them… so we must find a local (closest) point inside object and then search for a intersection around that region (for the closest intersection).

That’s exactly the problem I was describing (or trying to) before, although rather tersely. I’ll take a look at the pdf.

The idea of performing an initial march followed by a binary search within the intersection region did cross my mind as a combined solution a while ago - it’s a logical follow on - but I didn’t get the chance to try it out and I was concerned about the potential instruction count anyway.

Originally posted by fpo:
That is why we have the artifacts… is some section of the depth map on an intersection is smaller (thinner) than the linear search step size we might miss it and end up with next intersection instead of first (that gives the cutting off artifact where we see through first intersection into back of object shown here by SirKnight).
Good description.

Hi I’m new to these forums.

Your demo is awesome, fpo.
I was really speechless when I first read your paper, this is the way to go in the future!
(excuse my enthusiasm)

I experimented a bit with the shader yesterday in order to get rid of the artifacts and came up with the following solution:

The revisited shader walks now with a constant step size in the Texture Plane instead of the z-Axis. This way there are almost no visile artifacts at small viewing angles, but a lot more texture reads on the other hand.
But because there are fewer pixels to draw at small angles the fps stays more or less constant with varying view direction.
Unfortunately the new stuff is a lot slower (probably because the high branch penalty on NV40 and the mass of texture lookups -> about 50 fps fullscreen on an GeForce 6800 GT).

Here is the important part of the Code:

...
// RAY INTERSECT DEPTH MAP WITH BINARY SEARCH
// RETURNS INTERSECTION DEPTH OR 1.0 ON MISS
float ray_intersect_rm(
		in sampler2D rmtex,
		in float2 dp, 
		in float2 ds,
		in float dotprod)
{
#ifdef RM_NV40

	// *** NV 40 path ***
	float depth_step= max(dotprod*0.08, 0.005);
	const int binary_search_steps=5;

	// current size of search window
	float size=depth_step;
	// current depth position
	float depth=0.0;
	// best match found (starts with last position 1.0)
	float best_depth=1.0;


	// search front to back for first point inside object
	while (depth <= 1.0)
	{
		depth+=size;
		float4 t = f4tex2D(rmtex,dp+ds*depth);
		if (depth>=t.w)
			#ifdef RM_DOUBLEDEPTH
				if (depth<=t.z)
			#endif
			break;
	}

#else

	// *** Non - NV40 path ***
...
#endif



	// recurse around first point (depth) for closest match
	for( int i=0;i<binary_search_steps;i++ )
	{
		size*=0.5;
		float4 t=f4tex2D(rmtex,dp+ds*depth);
		if (depth>=t.w)
		#ifdef RM_DOUBLEDEPTH
			if (depth<=t.z)
		#endif
			{
				best_depth=depth;
				depth-=2*size;
			}
		depth+=size;
	}

	return best_depth;
}
...

The 4th parameter of the function is the dot product between entrance angle and z-axis (dot(axis_z.xyz,v) for the first call in main_frag_rm() and dot(axis_z.xyz,l) for the shadow tracing call.

The factors in the depth_step calculation are a speed/quality tradeoff and the max() avoids exceeding the pixel shader limits.

If we get this raytracing fast, it could become a real alternative to vertex displacement mapping.

Good work pro_optimizer!
I see you make a variable linear seach size depending on dot product of view anlge and polygon normal.

The only problem is that this only works with FP40 profile and it is still much slower than FP30 when using true loops/jumps. Maybe for NV50…

Thanks, fpo.
I have an idea how you can optimize the raymaching through the heightmap.

Currently it is kinda comparable to a blind flight: If you do not want to miss the surface then you must walk with smaller step size.
But it can be made more geometry aware by providing additional information in a second texture. I am thinking of filtering the height map with a variable size maximization filter and storing the extended height plus the filter kernel size per texel.
This way you can walk with xy_stepsize=filter kernel size (which can be rather large in most cases), as long as you stay above the filtered height. When you are below, you can continue with smaller step size sampling only from the heightmap cannel.
Alternatively, one might provide fixed size filtered heightmaps in rgb (like 50 texels, 20 texels and 10 texels radius) and true height in alpha.
This way one can switch between 4 different walking speeds depending on the current ray height.
This will reduce the number of steps taken considerably while keeping the same detail for the edges, at the cost of some more calculation overhead.

By the way, in the current implementation it is obviously not possible to view the heightmap from the side and one might think that one cannot define a heightmap for, say the side polygons of a box surrounding a 3d landscape.
But in fact the heightmap would be the same, only the texture coordinates must include the z-coordinate in the heightfield (effectively expressing the 3d position of the vertex in heightmap space). This way the raytracer would not start at depth=0.0 and walk until depth=1.0 but might instead start at a depth value defined by the current texcoord and walk until it leaves the heightmap or the depth range 0.0 to 1.0.

Unfortunately, I cannot build the program because it is missing the paralelo3d.h file. Is it available on the net?

Edit: Ignore the previous line, I have now downloaded the version with the library.