hi, i’m trying to implement a simple particle system

last time i used transform feedback objects and double buffering, it delivers (im my judgement) very good results:

without collision detection, about 3 millions particles can be simulated witht 60 frames per second (only gravity and collision with y = 0 level enabled)

with simple line-triangle-intersection method to detect collisions between particles and some (few) triangles in the scene, i can render about 800.000 particles with 60 frames per second

(my graphics card: NVIDIA GT 640, about 3 years old)

this time i want to push the limits further by using compute shaders, i managed to build this application:

web.engr.oregonstate.edu/~mjb/cs557/Handouts/compute.shader.1pp.pdf

i changed that to only 1 particle buffer for position / velocity / color / etc, but double buffered

the rendering method looks like this:

```
void ParticleSystem::Render(const glm::mat4 & view, const glm::mat4 & projection, float timestep)
{
// double buffered, switch vertex array every frame
static unsigned int flipflop = 1;
flipflop = !flipflop;
// bind both particle buffers
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, m_particle_buffer[1 - flipflop].ID()); // source
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, m_particle_buffer[flipflop].ID()); // results
// compute shader
unsigned int program = m_program_update.ID();
// simulate 1 frame
glUseProgram(program);
glDispatchCompute(m_particle_count / PARTICLES_WORK_GROUP_SIZE, 1, 1); // work group size = 128
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
// render 1 frame
program = m_program_render.ID();
glUseProgram(program);
glUniformMatrix4fv(glGetUniformLocation(program, "Model"), 1, false, glm::value_ptr(glm::mat4(1)));
glUniformMatrix4fv(glGetUniformLocation(program, "View"), 1, false, glm::value_ptr(view));
glUniformMatrix4fv(glGetUniformLocation(program, "Projection"), 1, false, glm::value_ptr(projection));
glBindVertexArray(m_vertexarray[flipflop].ID());
glDrawArrays(GL_POINTS, 0, m_particle_count);
glBindVertexArray(0);
glUseProgram(0);
}
```

question 1:

i’ve read that glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); is used to syncronize and is relatively expensive, so that if i want to read back data from that buffer, i can be sure that the compute shader already finished processing the data

BUT: i use 2 buffers, the comput shader calculates data for te next frame, the current one renders the “old” frame from which the compute shader ONLY reads data

do i acually need to syncronize ?

or can i delete glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); without problems ?

compute shader source:

```
#version 450
layout(local_size_x = 128, local_size_y = 1, local_size_z = 1) in;
layout (std140, binding = 0) buffer Source { vec4 DataSource[]; }; // particle buffer to read from
layout (std140, binding = 1) buffer Destination { vec4 DataDestination[]; }; // particle buffer to write into
const vec3 gravity = vec3( 0, -9.81, 0);
const float timestep = 0.016;
void main()
{
// read old data
// this is a 1-dimensional calculation because the data is a 1D array (of particles)
uint index = gl_GlobalInvocationID.x; // .y and .z == 1
vec4 data0 = DataSource[3 * index + 0];
vec4 data1 = DataSource[3 * index + 1];
vec4 data2 = DataSource[3 * index + 2];
vec3 position = data0.xyz;
float lifetime = data0.w;
vec3 velocity = data1.xyz;
float unused = data1.w;
vec4 color = data2;
// calculate new data
//vec3 accelleration = gravity;
vec3 accelleration = vec3(0, 0, 0);
vec3 position_new = position + velocity * timestep;
float lifetime_new = lifetime - timestep;
vec3 velocity_new = velocity + accelleration * timestep;
vec4 color_new = color;
if (position_new.x < -1) { position_new.x = -1; velocity_new.x *= -0.9; }
if (position_new.y < -1) { position_new.y = -1; velocity_new.y *= -0.9; }
if (position_new.z < -1) { position_new.z = -1; velocity_new.z *= -0.9; }
if (position_new.x > +1) { position_new.x = +1; velocity_new.x *= -0.9; }
if (position_new.y > +1) { position_new.y = +1; velocity_new.y *= -0.9; }
if (position_new.z > +1) { position_new.z = +1; velocity_new.z *= -0.9; }
// write new data
DataDestination[3 * index + 0] = vec4(position_new, lifetime_new);
DataDestination[3 * index + 1] = vec4(velocity_new, 0);
DataDestination[3 * index + 2] = color_new;
}
```

question 2:

what about the ModelxViewxProjection matrix calculation in the vertex shader (for rendering the particles) ?

should i move this calculation also to the compute shader and store the results in a third buffer ? what about syncronising ?

question 3:

what about a struct Particle { … }; in the compute shader as data source / destination array, can i assume that the data is packed tightly together or do i have to bother about any offsets between struct members ??

(i would like to avoid this uglyness)

```
vec4 data0 = DataSource[3 * index + 0];
vec4 data1 = DataSource[3 * index + 1];
vec4 data2 = DataSource[3 * index + 2];
vec3 position = data0.xyz;
float lifetime = data0.w;
vec3 velocity = data1.xyz;
float unused = data1.w;
vec4 color = data2;
```