I’ve been developing scientific visualization software under OpenGL for quite a number of years. Recently, I thought it might be a good idea to try to upgrade my old school fixed-pipeline and basic scenegraph skills to something a bit newer. I’ll tackle OpenGL 3.0+ shortly, but at the moment I’m trying to experiment with some different methodologies for storing matrix/vertex data.
I’ve been using GL rotate/translate to build my modelview matrix, then sending my vertex data down the pipe to be rendered. If I need to perform a lookup or intersection test, then I extract the modelview matrix from OpenGL, apply that matrix to each of my vertices (which often count into the tens of millions), and then perform the operation.
While this works flawlessly, I’m sure there’s a better way to do it. So I thought a good first step would be to build the modelview myself, manipulate each vertex in my software and then send the modified vertices off to opengl. I figured this would have the twin benefits of being offloadable to another thread while essentially giving me the ability to perform geometry operations for free (of course the operations themselves still have cost, but the setup is free).
Fundamentally this works. All of my basic functionality works fine – objects (lines, triangles, etc) are drawn correctly, textured, etc. Unfortunately, my fixed-pipeline lighting no longer works as intended. Previously I would:
Set my lighting position
Perform my modelview transforms using the API’s translate/rotate/look-at routines
Render my scene
I’ve tried a few hacky fixes, but really I’m just bumbling around hoping to land on the correct solution. Right now, I’ve got the equivalent of:
Define modelview identity
Build modelview by multiplying camera rotations/translations
Update vertices (normals and light positions, too) by transforming their original positions by the modelview matrix
Specifying the light position via glLightv (using the modified light position as transformed by my modelview matrix)
Render my scene
Lighting is noticable, but changes radically according to camera angle. I don’t remember seeing lighting like this since I first started bumbling around with lighting.
Anyone have any thoughts? Either on the lighting issue or my attempts to improving a basic scene? Am I barking up the wrong tree by moving the computations out of GL and into my own software?
I think typically that you want to create Vertex buffer objects for your vertices and then index buffers for your tris. Then send your matrices (modelview, projection) down to the card and let the card do the transformations. Doing them on the CPU defeats the purpose of all those little cores on the card!!
As far as lighting, the way to go is to start writing your own lighting vertex and fragment shaders! Its not obvious to me what issues you are having with the fixed func. lighting…
My thinking was that by having the CPU handle all of vertex transforms I’d have the information available to do all my operations on them (intersections, etc) without having to preprocess any vertices that may have been transformed by rotates/translates/scales.
That’s a bit too big of a leap for me right now. I’ll get to it in time, but I need to digest a few things before I leap into the world of shaders (I’ve been using fixed pipeline for over 10 years).
Ignore the lighting problem If I switch back to allowing the API/GPU to do my transforms, then my lighting problem should be solved.
You can avoid transforming all your object vertices into world coordinates for intersection testing by transforming the object you test with (perhaps a ray or something similarly simple?) into the object’s coordinate system.
If intersection testing is a bottleneck for you and your objects are as big as you describe you should probably also consider using some acceleration data structure, like bounding volume hierarchies, kd-tree, or something along those lines.
The GPU is really good at transforming vertices fast, so you should let it do that if at all possible
I’ve thought about transforming my rays/etc into world coordinates, but half my objects are are world coordinates and the other half are in local coordinates. It’s the nature of a beast that has grown progressively larger over the years
I’m accelerating my searches using a pretty simple octree. No noticable performance hits yet, but then I haven’t bothered profiling it, either.
I’ve decided to bite the bullet and hack together a 3.0+ sample. We’ll see where that takes me.
You have tens of million vertices?
This is quite a lot!
I remember about 10 years ago I tried to handle about 100000 to 1 million vertices on cpu.
Forget it, even the biggest newest 8/16 Core CPus are not able to do it performantly.
It’s simple. Todays top CPUs have about 20GB/s peak performance in memory access. You need 3 access
read (transfer to GPU)
Having tens of millions vertices, no Cache will help you very much, your effective bandwith will be about 1GB/s.
Lets look to GPU. You have on Top GPUs more than 200 GB/s bandwith!
On performance side, you need all your CPU performance/bandwith to transfer your data to GPU. Doing more on CPU will drastically reduce performance, because the limit is the memory bandwidth on CPU.
So I think, its not worth looking into transforming your vertices on CPU. GPU is much better designed to do it really fast.
Furthermore on CPU, you should use SSE, ok some compilers may do it for you, but its tricky, and a lot of pitfalls are waiting for you. Good old x87 code without prefetching and streaming additions will not make you happy.