about your world coordinate point, say you have:

I don’t understand what you’re saying. What is “T(…)” and what are the “M’” values?

Why not just upload the znear/zfar values along with the matrix? It could be a part of the same uniform block. 2 more floats is a big deal?

Because they’re already part of the matrix. The only reason you can’t use them with what you’re doing is because you composed that matrix with another, thus destroying the values. To put it another way, if you were doing things right, you’d already have them.

And it’s not the zNear/zFar specifically that are important; it’s the Z row of the perspective projection matrix that matters.

Yes, there are in fact ways to work around all of these issues. But why over-complicate things? Isn’t it much more simple to just transform from model to camera-space, do your lighting there, and go to clip-space? It’s easier for everyone to understand what’s going on. It’s easier to explain. It’s easier to test and debug and work with.

In general, you don’t gain anything by doing what you suggest. Your perspective projection matrix is usually a lot more constant than your camera matrix. Whereas your model-to-world transforms are likely to change on a frame-to-frame basis, as is your world-to-camera transforms. If your scene is animating in any significant way, then you’re going to need to update the model matrices every frame.

If the camera moves along with them, you update exactly one matrix. If we did it your way, every time the camera moves, we’d have to update two matrices. I’ll be generous and assume that you’re storing this data in uniform blocks (because if you’re not, then the camera+perspective matrix must be updated in every program you use).

Is it possible that, for your particular needs, you could get something? Yes. If your camera is never animated and is more often than not fixed, you could gain something out of this. You would get exactly one less CPU matrix multiply for every object (the multiply with the camera matrix), and one less vector/matrix multiply for each light.

Somehow, I don’t think you’ll notice any performance difference from those. This smells of premature optimization.