(This is somewhat of a continuation of my previous post)
I’ve been trying for more than a week to add indirect rendering with a compute shader. I can’t find any useful resources. The closest I’ve found was this blog post. However, it’s missing exactly what I’m struggling with.
Currently I have 3 shaders, compute, vertex, and fragment.
I want to upload data to the compute shader and perform certain operations on said data.
Next, I want to take the data from the compute shader (i.e. transforms, and such), and pass it into the vertex shader.
In my previous post apparent solution was indirect rendering. I’ve made a simple test program which “indirect-renders” a bunch of shapes. How do I add a compute shader in the mix?
Just for clarity, I’m asking how to pass data from compute shader to vertex shader.
The compute shader writes to a buffer bound as a SSBO, then the client program binds that buffer as a VBO, UBO or SSBO (depending upon whether the data is used for attributes, uniform variables or buffer variables) before issuing draw calls which use the buffer. You need a
glMemoryBarrier call between the
glDispatchCompute call and any draw calls which use the data generated by the compute shader.
Use of indirect draw calls is only necessary if the draw call parameters (e.g. the number of vertices or the number of instances) are calculated by the compute shader and you want to avoid having to read the value(s) back to the CPU. In which case, the compute shader needs to write one or more
DrawElementsIndirectCommand structures to a SSBO which the client program binds to
GL_DRAW_INDIRECT_BUFFER before executing an indirect draw call.
So all I need to do is rebind my “output” SSBO to a VBO after I perform a dispatch?
Regarding your 2nd paragraph, I don’t really understand what you mean by “Use of indirect draw calls is only necessary if the draw call parameters are calculated by the compute shader”, right now when I call “glMultiDrawArraysIndirect” I’m giving it a primitive type, no offset in indirect, the number of instances I wish to draw, and 0 for stride.
However, I do have an array of “DrawArraysIndirectCommand” structs which I fill before draw, so why do I need the compute shader here, or am I misunderstanding something?
Or, are you talking about specifically using a compute shader “rendering”. As in filling an SSBO with “indirect-draw-data” and rebinding the SSBO to a “GL_DRAW_INDIRECT_BUFFER”?
Well, that is why indirect rendering was created in the first place. Yes, the CPU can fill in the rendering parameters indirectly through a buffer, but the main point of the feature was to allow GPU operations to calculate rendering parameters.
It’s OK to not do that, but you need to let us know what exactly you’re trying to do. Since generating rendering commands from the CS was kind of a big refrain in the other thread.
I see that makes a lot of sense, I think I’m somewhat starting to understand.
There’s isn’t any need for indirect rendering in my Particle System (From my previous post, or even with this example), if I understand correctly all I need to do is just to rebind the SSBO I used as a “transforms output” to a VBO and send that to the GPU.
Am I on the right path?
I simply want to use the compute shader to calculate new transforms, and send those transforms directly to the vertex shader, for drawing
Yes. Although there’s nothing stopping you from having the buffer bound both as a SSBO and VBO simultaneously. You can have it bound as both and just alternate between
Don’t forget the memory barrier.
I’ve tried @GClements suggestion, I rebound my “output” SSBO to a VBO after I dispatch, and it seems to work. Thank you very much for the help.
I have one more question, in my particle system I eventually run a condition which does a screen-particle bounds check. When a particle leaves the screen I restore it back to its original starting position with new properties (Trajectory, Rate, Angle, Etc…). Is it possible to perform something similar on the GPU?
If not, what is the best approach without having to retrieve the SSBO buffer from GPU?
Yes. You can either have the compute shader perform the check (in which case, you need to pass the viewing transformations to the compute shader), or use the vertex shader.
If you have the compute shader perform the transformation to clip space, you may as well write the transformed position so that the vertex shader doesn’t have to repeat the calculation.
If you have the vertex shader perform the check, you can return the result via transform feedback or writing to a SSBO. One disadvantage with transform feedback is that output is in machine words (
float), which is excessive if you’re returning a single boolean value per vertex. If you use a SSBO, you can use bytes or even individual bits, which can be set or cleared with
Well, that explain how to perform the conditional check on the compute shader. However, what I’m wondering is how do I know when a particle has left the screen after I ran the compute shader. Because I need to know when to “re-create” a particle on the CPU
The same way you did on the CPU.
Your CS no doubt has the position of the particle, right? It must, since it’s job is to compute the new position. So at some point, it has the position. You can provide the screen rectangle as a uniform.
So… what are you missing that you think the GPU can’t do the computation?
Do you really? Isn’t the whole point of this to minimize CPU/GPU interaction?
It seems to me that what you need is to know how many particles were visible last frame, so that this frame you can emit more if there were too few.
The problem is not the conditional check itself, doing in the compute shader is not the problem.
It’s the result of the conditional check. How do I retrieve it, without doing the GPU-CPU-GPU trip.
The point of this is to not have to “retrieve” anything. We talked about this on the other thread.
If you have particles that are being created automatically by the system (rather than creation which is mediated by user intervention), then the system which automatically creates them should be on the GPU.
Worst case, if for some reason you can’t structure your system that way, you only need to read back the count of the previous frame’s number of particles. Which should require little if any actual CPU/GPU synchronization (since the time spent rendering the particles generated will put enough of a buffer between the generation of the count and reading it).
I know… I’m sorry if I’m reiterating, I just can’t wrap my head around this problem.
It seems like moving the particle “re-creation” code inside the shader is my only option, I’ll give it a try.
Even though the function I use to re-create the particle uses RNG?
Is that a good idea to put inside the shader?
Maybe I’m missing something, how does that help me find which particles have left the screen?
Why do you need to? GPU-driven rendering techniques are fundamentally based on finding ways to make the CPU care as little as possible about rendering.
Does the CPU need to know “which” particles are no longer present, or does it need to know how many so that it can say to add that many more to the system? And if it needs to know which particles aren’t there… why?
If there’s a giant list of particles, all of which exist, and any that go off-screen are always reset… then why does the CPU need to be involved at all? When the GPU is updating the status of a particle, if it falls off-screen, it just resets it.
If you need a random number to determine how to reset a particle, there are plenty of solutions for creating pseudo-random numbers on the GPU based on a seed. A seed that can be an atomic integer value that you change every time you access it, so that two different particles are unlikely to get the same random values.
I have a few updates.
I have managed to get the particle system to (kinda) work with compute shaders.
I upload the particle data to the GPU once per Emitter creation as an input SSBO.
When it’s time to Update the particles, I dispatch and store the new particle data inside two different SSBOs.
One’s for the new particle data, which is later copied (using
glCopyBufferSubData) into the input SSBO, and that gets sent to the compute shader the next time the
update function gets called.
The other SSBO is for screen transformations, which I then rebind as a VBO that gets sent to the vertex shader.
The last thing I’m missing which surprisingly, I couldn’t find any information about online, is a simple GLSL random number generator with an arbitrary range, Similar to C++'s
I have found many examples of a 0.0-1.0 ranged RNG, is there some way I can “manipulate” the result to fit in a certain range?
To turn an RNG that produces uniformly distributed values in x in [0, 1) into one that produces uniformly distributed values y in in [a, b) you can use:
y := a + x * (b-a).
Apology for the late reply. I’ve actually have found a “mapping” function, I will use your suggestion since it has less operations, so thank you.
I have some good news.
I’ve finally managed to move all of my code into a compute shader, reworked most of the particle rendering, and now it finally works. Before I had 130 emitters with 250 particle per emitter, resulting in 32,500 particles rendering at 60~ FPS (with instancing).
Now, I’m rendering 700 emitters with 250 particles, resulting in 175,000 particles at 60~ FPS.
I want to say a big thank you to @Alfonse_Reinheart, @carsten_neumann, @GClements. Thank you all for the advice, and being so patient with me.
I have one last question, that is with the architecture of an OpenGL program.
At the moment (most) my (global) VBOs, textures, and other kinds of buffers, are somewhat thrown all together in
How should I structure my programs, with OpenGL in-mind, and if there’s any resource you’d recommend?
i usually put all of opengl-related stuff in a “graphics.cpp” file (no class). it has an “initialize()”, “Render()” and “cleanup()” function which i call in main(), among other useful things
by the way, here i have some examples for indirect rendering: (but not related to particles)