I have written a kernel that uses Newton’s gravitational formula to calculate new velocities for bodies in a simple physics simulator of mine. It works well and I have added some linear interpolation to help make the transitions between gravity update passes more smooth. However because I am very new to Open CL I am afraid I am making some large mistake that results in lower performance. I have never worked with C and I am in fact writing this project in Java using LWJGL. I would really appreciate it if anyone could look at my kernel and give me some quick tips on how to improve.

For my kernel, on startup, I fill the positions array with random points. Then run the init kernel. After that, I call the gravity kernel two times each second. Because I know when the gravity kernel is going to be run next and when it was run last, I calculate a linear interpolation value to use in the lerp kernel that is called every frame. Right now everything works as intended I am just looking for some help as to how to make the kernel run faster.

My Kernel:

```
//This is the "main" kernel. It calculates the new velocities for each object
//prePos - The coordinate that was calculated last time.
//It is used with postPos to linearly interpolate between those points to make the illusion of somthness
//postPos - The most up to date positions
//velocities - The list of velocities for all the objects
//colors - Each object's color
//size - The total number of colors
//add - A value to use to determine how long the resulting velocity vector should be
//mass - The mass of each object (they all have the same mass for now)
kernel void gravity(global float4* prePos, global float4* postPos, global float4* velocities, global float4* colors, const int size, const float add, const float mass) {
const int itemId = get_global_id(0); //Get this thread's ID
if(itemId < size) { //If we are within the objects that we want to update
float4 pos = postPos[itemId]; //Retrive the position
float4 vel = velocities[itemId]; //Retrive the velocity
float4 otherPos, deltaPos;
float gravity, dist;
for(int i = 0; i < size; i++) { //Loop through every other object that exists...
if(i != itemId) { // If the other object isn't this object...
otherPos = postPos[i]; //Get the othert objects position
deltaPos = (float4) (otherPos.x - pos.x, otherPos.y - pos.y, otherPos.z - pos.z, 0.0f);//Calculate the delta between this thread's object and the other object
dist = sqrt(pow(deltaPos) + pow(deltaPos) + pow(deltaPos.z, 2.0f)); //Get the distance between them
gravity = mass / (dist * dist); //Newton's gravitational formula
deltaPos /= dist; //Normalise the vector
deltaPos *= gravity; //Factor in gravity
vel += deltaPos; //Add the newly generated vector that represents the gravity between this thread's object and the other object to this thread's velocity
}
}
prePos[itemId] = pos; //Set the previous position to the one without the new calculation
postPos[itemId] = pos + vel * add; // Set the new position to the old one plus the velocity
velocities[itemId] = vel; //Set the velocity
}
}
//This kernel calculates the values for the positions array using prePos, postPos, and a linear interpolation value.
//Open GL renders the positions array as points so the points in the positions array cant jump around because that would be very noticeable and would not look nice or realistic
kernel void lerp(global float4* positions, global float4* prePos, global float4* postPos, const float value) {
const int itemId = get_global_id(0);
positions[itemId] = prePos[itemId] + value * (postPos[itemId] - prePos[itemId]);
}
//This kernel sets up the values for prePos and postPos so that calling the gravity will work as intended
kernel void init(global float4* positions, global float4* prePos, global float4* postPos) {
const int itemId = get_global_id(0);
float4 pos = positions[itemId];
prePos[itemId] = pos;
postPos[itemId] = pos;
}
```