In my application until now I drew several terrain chunks using glDrawArrays(), used a CPU cull to represent only the chunks that were supposed to be visible, let’s say that of about 100 possible chunks, could end up making about 20 or 30 times the following calls :
My goal is to summarize all this in a single draw call and a single VAO bind, and like all the chunks have the same number of attributes vertex I was thinking of using multiDrawArraysIndirect(), to merge all these VAOs that I was using into a single VAO, and fill in the indirect buffer with the individual size of each VBO :
My question now is, CPU culling keeps giving me which of the chunks they have to show on the screen, and I want to use that list to make a single call without having to redo the indirect buffer in each cycle, Is it possible to do?
If the set of chunks changes, then so must the draw parameters (i.e. the contents of the indirect buffer). Also, if you’re doing the culling on the CPU, there isn’t a great deal of difference between using glMultiDrawArrays() and glMultiDrawArraysIndirect(). The latter just avoids the need to transfer the parameters from the CPU; this is advantageous if you’re generating the parameters using the GPU (e.g. via transform feedback or a compute shader) but doesn’t really matter if you’re just populating the indirect buffer from the CPU.
I really do not need to change or redo the indirect buffer, the CPU culling only tell me if the chunk is inside into coordenate sistem, Anyway, I think this was not understood well, so I will try to change the focus of my question. then what difference is there between 100 elements of an indirect buffer and a single element?, i know, the theory tells me that I can fill with n elements to be able to represent only the n elements I need, because if not, I could simply create a single large element. so far I’ve only been able to read about id_drawcall argument that will be passed to the shaders to know what element you are representing, this is close to how it is done with the representation of instances and it’s not what I need, but frankly the logic tells me that there has to be some way to be able to represent only the elements that I need from the indirect buffer that I fill before go the render cycle.
summary, what I want to know is if it is possible to make a unique drawing call and a single VAO bind, instead of the 20 or 30 that I was doing until now.
[QUOTE=martel;1293630]My goal is to summarize all this in a single draw call and a single VAO bind
CPU culling keeps giving me which of the chunks they have to show on the screen, and I want to use that list to make a single call without having to redo the indirect buffer in each cycle, Is it possible to do?
I really do not need to change or redo the indirect buffer
summary, what I want to know is if it is possible to make a unique drawing call and a single VAO bind, instead of the 20 or 30 that I was doing until now.[/QUOTE]
There seems to be some unstated assumption behind your question. We need to get that stated here.
You said you are CPU culling so that the draws that need to execute every frame are changing.
You also said your goal is to render all this in a single draw call. You can do that with glMultiDrawIndirect (MDI) or glMultiDraw. So that goal is satisfied.
Then you said (with MDI) you don’t want to (and then later said didn’t “need” to) update the indirect buffer. Why?
The list of draws you’re doing each frame is changing, so you need to update the indirect buffer. Either that, or something about your GPU state needs to change to force a different set of vertices and primitives to be rendered. Or you need to change what GL calls you’re making on the CPU to render each frame.
If your concern with updating the indirect buffer is that you need to totally rework all of the content, you don’t. You can just keep the number of elements and content of it as-is, and just update the primCount member in each struct with the result of your culling (primCount 0 = culled). Then just subload the updated copy to the GPU. Yes, if you’re doing these primCount updates on the CPU, you’ll still probably want to subload the whole indirect buffer to the GPU for simplicity. But you won’t have to deal with the length of that array changing (or the drawCount parameter you provide to the glMultiDrawArraysIndirect() draw call changing). Then the only question is what (if any measurable) GPU frontend cost to skipping draws in an MDI call which have a primCount of 0.