The different coordinate systems are there, I believe, to allow a developer to pick the one most convenient for the particular task they’re performing as they build the image.

The different coordinate systems are there because of math. They are reality rather than a convenience.

I always hated the terms model space, world space, view space, screen space, etc. I suppose that’s largely because terms like that and “pipeline” frustrated and confused me for years. Reverse the words like “space view” and then replace the first word with “relative to the” and you’ll have the answer: “relative to the view”. These spaces exist because of the math you use to put a 3D world on to the screen. The 3D world is an illusion represented by numbers.

You can hard code things directly into this world, but you’ll discover very quickly that unless you are doing this procedurally it’s far too difficult to do. In my example engine code, I have a “hand coded model class” that allows you to merely specify a vertex and an index buffer to create a model for the scene.

But do that a couple times and you’re realize hand coding vertices, calculating their normals, and so forth by hand is ridiculous for any model more complex than a cube. You just absolutely have to use a modeling program to create non-procedural models. There are several of them out there including 3D Studio Max, Maya, and Blender that allow you to not only create far more complex models with say 10,000 vertices, but also allow you to bake textures called “maps” for things such as reducing a 3 million triangle model down to a 3 thousand triangle model that looks almost identical through the use of a normal map.

Anyway, when you export the data to some sort of file whether it’s an OBJ file or some other type of file, it contains all the data for the model including the positions of the vertices. When you import this data, the model is generally in the exact same 3D position it was exported in. So, it’s not positioned in your scene. It’s likely right there at the origin facing the same direction it was when created. (Although sometimes the modeling program will switch axes around and such which confuses the issue.)

You might want to watch my matrix video and if you don’t know vector algebra inside and out you may want to watch my vector video first because the matrix video kind of builds on some of the ideas in the vector video.

In the matrix video, I talk a little about how each matrix works. There’s a menu on the matrix video, but around 35 minutes in I get to talking about projection matrices.

So, the question comes up “How do I turn a bunch of numbers into a 2D picture on my computer monitor that I can then create 60 times a second to simulate 3D animation? It’s great that we’ve got these mathematical concepts of a 3D world, but how do you display it?” The answer is the projection matrix. As the video explains, that matrix has all the math necessary to convert 3D positions into 2D positions on a 2D plane that matches up to the positions on your computer monitor so that you can draw that on your computer monitor. But if you follow the math, you’ll quickly realize that all this “camera” can do is look straight down the Z axis (or in theory you could axis align it down any axis, but the point is that once that axis is defined it can never look down any other or otherwise move). So, if your object happens to be in view, the projection matrix will convert it to 2D space so that your vertex and fragment shaders can draw it on the screen. Generally, the first thing your vertex shader does is multiply all the positions of the vertices of your model by the projection matrix to convert the vertices to 2D screen coordinates. (Actually it multiplies the vertices by the world, view, and projection matrices right at the start, but I’m building up to that.) But the problem is that with back face culling on your model is probably invisible at this point because the “camera” is inside the model. The camera with nothing but a projection matrix is at the world origin (center at 0,0,0) and looking straight down the Z axis. So, the only way to make it view-able here is to create the model and export it in a position further down the Z axis where it will be in front of the camera.

So, hurray, we got 3D data to display on the 2D computer screen, but this “camera” is near useless, because we can’t move it. In reality, there is no “camera”, there’s just math to convert 3D space into 2D space. So the question is obviously, “Well then how do you move the camera if there is no camera?” You can’t. But what you can do is move the entire 3D world. That’s the job of the view matrix. By performing some math on the positions of the vertices of every object in the scene, you can move the entire scene around. So to simulate a “camera” you can shift the whole world right in order to make it appear that the camera shifted left. Move the entire world down to make the camera appear to have moved up. Move the entire world backwards to make it appear that the camera is moving forward through the scene. And rotate the entire world around the camera position clockwise in order to make it appear that the camera is rotating counter-clockwise. Take note that this opposite movement is why the view matrix is inverted. Invert it again to turn it into a matrix that’s basically just a world/object matrix for the camera (just be sure and invert it back before actually using it as a camera so that it does everything opposite).

Remember that the whole point is to draw the screen about 60 times per second. Each of these drawings is called a frame. And if each frame shows change, you will have 3D motion and animation.

Ok. Awesome. So, we have a 3D world that we can display on a 2D plane that can be mapped to our 2D computer screen to light up the pixels on the screen. And we now have a “camera” that allows us to move through and around the scene. But we still have a problem in that when we import our model data from our modeling program it’s exactly where and how it was exported. We need a way to position objects in the scene and re-orient them to face various different directions. That’s where the object’s world matrix comes in (often called various different names like the “object matrix”, “world matrix”, or “model matrix”). This matrix when applied to every vertex position in the model (by multiplying it) will position and orient the model in the scene (it can also scale and skew but I recommend getting the scale right when you create the model in the first place and not using scale or skew). This allows you to have 4 copies of your model all positioned and oriented separately because they all have their own world matrix. You can even have multiple matrices per object, but that gets a little more advanced.

And part of the beauty of all this is that you can combine the world, view, and projection matrices into a single matrix every frame and apply only that one wvp matrix to every vertex in the model. If you have 10,000 vertices that saves a whole lot of math being done and that single wvp matrix positions the model in the world, simulates a camera in the scene, and projects what’s in front of the “camera” to 2D space where it can be drawn, all in one single multiplication operation. The graphics card can also do things massively parallel. So, it can be working on hundreds of vertices simultaneously applying the wvp matrix to them and then drawing them. Then for every other model in the scene you use that model’s world matrix to calculate another wvp matrix (the v only changes if the camera moves and it shouldn’t generally during a single frame and the p pretty much never changes ever as a general rule). So this is all incredibly mathematically efficient.

The end product is that by the time the data gets to the rasterizer, it’s already in 2D screen coordinates where the rasterizer can create triangles out of the vertices and shade the space inside the triangle in order to draw it on the 2D screen. The rasterizer is working in 2D but with perspective projection it looks 3D even though it is 2D at this point. (Then the fragment shader works with the rasterizer to define the color of the pixel that is currently being shaded in within the triangle.)

Getting back to what you were talking about, sometimes you may want to do something relative to the 2D computer screen. For example, post processing effects are basically applied to the whole image in 2D. So, you might say you are working in “screen space”. In order to do that, you have to be working after the wvp matrix has been applied. Generally, you want to be working in the 3D world, which would be “world space” and so that’s before the math of the view and projection matrices have been applied, but basically after the world matrices have been applied to position things in the world. If you are working before the math of the world matrix has been applied then you are working with the raw coordinates of the 3D model as the data was exported in the file before it’s put into the 3D scene and that would be “model space” or “object space”. You basically create an entirely separate “coordinate system” by applying a matrix to all of the vertices in the model.

Anyway, if I wanted something positioned in the scene to be attached to the camera, I would draw it as a child matrix.

First you have to understand how to create parent-child relationships between objects in the scene (before you turn the camera into a parent). You do this by combining (through multiplication) the parent’s world matrix and the child’s world matrix and using the result world matrix to draw the child rather than it’s own world matrix. This will attach the child to the parent but not the parent to the child. In other words, when the parent moves, the child will move with it, but the child can move and rotate freely. A good example of this is wheels on a car. The body parent of the car causes the wheels to stay attached when the body of the car moves, but the wheels being children can rotate around their own axis without affecting the car’s body (the parent). This is rigid animation. You can rotate/animate the wheels in code.

Now, if you want to parent the camera to an object, it’s going to be the same principle except you have to keep in mind that the camera is inverted by definition. So, I would maybe have to play with this a bit, but it should be that if you invert the camera matrix, you can use it as a parent to an object and then when you draw said object you multiply the inverted camera matrix times the object’s matrix and use the result to draw the object. This will make the object’s world matrix relative to its parent rather than relative to the world origin. So, just change the child’s world matrix to position it wherever you like relative to the parent, such as directly in front. (I do this in reverse to make the camera a child of an object to create a 3rd person chase camera).

When doing this, you want to use the matrices to store the data from frame to frame. As a beginner, it’s common to recreate all your matrices every frame. Don’t do that. Learn to use matrices properly. It means getting very comfortable with matrix algebra and trusting the matrices to keep track of all your position and orientation data. But in the long run, this approach will help you more than you can imagine. If you’re not super comfortable with matrices, again, you’ll probably want to watch my matrix video. It should at least get you started down the right path of using matrices. You might also want to watch my Gimbal Lock video to see what goes wrong when you don’t store orientation data inside the matrix from frame to frame but try and keep track of it as “yaw, pitch, and roll” and build the matrix every frame.