I use model and view matrix on the cpu side with type dmat4. When passing to vertex shader I multiply them and pass them as mat4 matrix. But there is a problem that if the object does not change its position in the world space, I still need to multiply the model matrix of this object with the view matrix. Result: the application slows down when there are a lot of objects. Please suggest a way to fix the problem.
Define “slows down”.
- What frame time in msec (including CPU+GPU time) do you get with no matrix updates?
- What frame time do you get with all matrices updated each frame?
- How are you measuring frame time?
- How many matrix updates are we talking about?
- How many draw calls?
- What GPU and GPU driver is this on?
Computing dmat4 products on the CPU, uploading mat4 uniforms to the GPU, and reading mat4 uniforms on the GPU can be very cheap.
For starters, you need to use an instrumented CPU profiler like Tracy profiler or at least a sampling CPU profiler like Very Sleepy to see where your CPU time is going. Running A/B tests with different subsets of the problem can help nail down where your primary bottlenecks are.
Some possibilities to consider:
- Are you jumping randomly all over CPU memory to gather data for these dmat4 products, pulling in all kinds of useless data into the CPU cache, such that you’re wasting a lot of time blocked on CPU mem fetches? If so, fix that.
You want to be ripping through CPU mem sequentially to recompute these products, making max use of the prefetcher and wasting no bytes (or as few bytes as possible) in the mem fetches pulled into the CPU cache.
For more on this, see Data Oriented Design.
- Are you doing dmat4 products efficiently? Are you using SIMD?
- Are you updating the mat4 shader uniforms efficiently? The answer is probably no, for a number of reasons. And if you’re on mobile, you almost certainly aren’t doing it efficiently. If you’re not explicitly avoiding implicit sync, you’re probably triggering it.
- If you have a lot of objects, are you batching them into shared draw calls? If not, fix that. Then batching mat4 uniform updates is pretty much a no-brainer.
- If you have a lot of objects, you don’t need to do a dmat4 product per object. Batch these into shared draw calls about a common origin (call it a cell, tile, object group, whatever).
Only that common origin needs a dmat4*dmat4 MODELVIEW product on the CPU. Pass that into the shader as a mat4 MODELVIEW for the “origin” point. Then do mat4*mat4 products in the shader to place individual objects/subbatches relative to that common origin.
Result: tons fewer dmat4*dmat4 multiplies on the CPU, much less mat4 uploading to the GPU that needs to be made efficient, and the ability to keep most of your objects batched into shared draw calls about a common origin.