physics on gpu

I can do my physics calcs on the cpu or gpu. But if I use the gpu I am wondering if this will limit me on other effects that I could better use the vertex processor for. Is this true?

Also if I do the physics calcs on the cpu then I would need to transform my verts on the cpu, but the gpu is doing the same transformations faster,
so should I send the transformed data back to cpu or is this slower than doing my own transforms on cpu?

(I don’t think the fragment processor is involved in this, correct me if I’m wrong)

Yes if you do it on the GPU you will limit the amount of effects you can do as you do have limited resources.
If you do it on the CPU instead you can use the GPU for translations, i mean it does that anyway so why not use it.

There are a few issues you need to consider for GPU physics

  1. while the GPU is vastly faster it does have limitations on what it can do, particles without any collision detection or response generally does well, but other physics simulations may not without severely impacting performance.
  2. transform feedback or any feedback at all is slow or limited
  3. Will it be faster or will the extra load on the GPU cause it to not render as fast is a question only you can answer using various experiments

(and yes personally i would use the fragment processor for this as it is faster and have a well defined output)

If this is in relation to a graphics program, then by all means, try to figure a way to use the GPU. The throughput will be limited by vsync anyway, so you’ve got plenty of extra GPU time to spare; and since GPUs are increasing in speed faster the CPUs these days, your code will be more forward-looking.

If it’s a pure-physics problem without a graphics component, the GPU may still be of help, but you’ll have to compare more carefully against how long the same calcs would take on the CPU.

I honestly disagree. GPUs are often too slow, so why also waste their processing time for physics. Multicore-processors are on their way and it is easier to handle several threads on general purpose CPUs, than to do the same, but with a special-purpose CPU, that was not designed for physics.

The fact, that you try to push something onto a GPU means, that you need to transform your whole problem into a way, that the GPU can handle. Often that can become quite a challenge.

For example nVidia/ATI always say, one should calculate particle-systems on the GPU. However, only the most simple transformations are easy to realize in a shader. So, every particle-system, that has to run on the GPU will yield much inferior effects, than one, that is simply done on the CPU. With multi-core processors, you can do your update in parallel very easily, and thus actually USE those extra cores, which will be a major concern in the near future.

In my opinion, as long as you plan on doing great graphics, don’t push anything onto the GPU that does not clearly belong there. For example skinning is clearly something worth to do on the GPU. Physics and particle-effects are something that is very hard to realize on the GPU and will always be limited, compared to a CPU-implementation, so don’t do it.

I would only push something non-graphics-related onto the GPU, at the very end of a project, if it is clear, that my CPU has too much to do and the GPU has spare processing power. Just as every good book tells you: Don’t optimize too early.


There’s no question that multi-core stuff is important, and GPGPU doesn’t solve everything. But it’s definitely an important area of research; 10x speedups are nothing to sneeze at when they’re possible.

True. But only in rare cases you will get such a speed-up. Usually you can be happy, if you get the same speed as when doing it on the CPU, but therefore have freed up your CPU a bit. Especially when doing a read-back all your performance gains can be lost very fast.


Not so rare. It’s just a matter of turning the problem around until you find a way to exploit the GPU model, or at least find a good approximation to the problem that exploits it. It’s less an optimization than a complete reworking of the problem, so I don’t see it as something that can be handled late in the process.

It definitely requires great care, and not every problem is susceptible to this approach, but a fair number are----particularly when you can pull out algorithm subcomponents and replace them with parallel equivalents anyway.

Of course, I don’t know the specific physics algorithms involved here, so I can’t really say how well-suited they are to the GPU approach.

Thanks for the replies. So if not for physics then what else can the vertex processor be used for? Fragment processor is for blurring and lighting right?
I don’t think I have to worry about being cpu bound(all code in asm) so I’m just wondering about the uses for these two processors with respect to graphics.

In a general way, the vertex processor is used for taking a variety of inputs and giving a variety of outputs which have been transformed. All of these inputs and outputs are merely attributes, but one, position, is special: It controls which of the other attributes are sent to various instances of the fragment shader.

In a general way, the fragment processor takes in a wide variety of data (either interpolated outputs from the vertex stage, or read from textures), and writes a very few values in a single location. At most four values with one render target; up to 16 with MRT. The location a given fragment shader instance writes to is fixed, but via textures it can pull data from anywhere.

Usually the data the fragment processor writes is called “color”, but really, it can be anything that you might want to stick into a given cell of a 2D array.

These observations lead to the following (general) notion: When using the GPU for something other than normal graphics operations, the vertex shader should be used for scatter operations, and the fragment shader for gather operations. So typically you’d invoke the vertex program by rendering data with GL_POINTS, and you’d invoke the fragment program by rendering a single screen-size GL_QUADS.

That’s assuming the GPU doesn’t have anything else to do, which is generally true with most GPGPU applications, but not with games or similar graphical applications. There you usually want to do physics and graphics, so the overall speedup might be a lot less than the speedup of one individual task.

I usually try to do as much as possible on the GPU as long as I don’t need the result back on the CPU. So simulation of entities is on the CPU, because I need the result back for my game logic, but the simulation of a particle system is a candidate for the GPU, because I don’t care about the result, as long as it gets rendered correctly…

I don’t think I have to worry about being cpu bound(all code in asm)

And what makes you think that writing all code is asm has anything to do with being CPU bound? Do you really think you can optimize better than a modern compiler? I doubt it.

Do you really think you can optimize better than a modern compiler? I doubt it.

Don’t be so quick to make assumptions of the skills of others. In math intensive areas there is plenty of room for hand asm optimizations.

I can no longer find the whole article that describes the process but check out the incremental improvements one can make on a matrix * vertex method:

(Found it:

That is not something a compiler can do on its own.

It also shows that most programmers c++ vertex/matrix libraries are probably slow (slower than they could be with appropriate asm optimizations).

The original poster said all code is written in asm. And I don’t doubt his skill in particular, I doubt that anyone could optimize better than a modern compiler, at least globally.

I can understand when someone hand-optimizes some math code to utilize SSE or 3DNow. But if you try to write the whole program in assembler, you’re most likely making it slower.

Besides that, the statement that something can not be CPU bound because it’s all written in assembler is absolute nonsense. Even if we assume just for the sake of argument that the code is really better than compiler-optimized code, it still doesn’t mean it’s not CPU bound.

I have heard the argument “it’s written in asm, so it’s got to be fast” many times, and most of the time it comes from someone who doesn’t even know what optimizations a compiler can do, let alone how to optimize better than it, or more importantly, where it pays off to optimize.

That is not something a compiler can do on its own.

Don’t be so quick to make assumptions on the skill of compilers :wink:

Seriously, it is quite possible for a compiler to do this on its own. The problem is just that most compilers don’t do it :wink:

Agreed. Bounded-ness has nothing to do with the language with which you write your algorithm but with the algorithm itself (and the medium in which the algorithm is run, be it CPU or GPU).

Honestly, when I think of coding in asm I think of an unmaintainable mess, much like using templates or other silly C/C++ idiosyncrasies, in the belief that it makes it “better”. Everything has a place, however, and a the statement "Do you really think you can optimize better than a modern compiler? I doubt it. " was a little too direct/blunt to go unchallenged.

Don’t be so quick to make assumptions on the skill of compilers :wink:

Seriously, it is quite possible for a compiler to do this on its own. The problem is just that most compilers don’t do it :wink: [/QUOTE]

A compiler could take MatrixMultiply1 to MatrixMultiply3 if it knows the matrix is in column-major order (it doesn’t). Beyond that, the method is actually modified to take advantage of the SSE instructions, not the other way around (such as multiple multiplies per call). A compiler cannot do that.

Coding much beyond the math realm in asm makes no sense to me. However I know proponents of Forth would be sure to argue even that.

(See an example of a programmer talking about coding his assembly using Forth: Blog entry 070910 and upwards. Craziness!)

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.