How to do "General Computation" on GPU

Hello everyone,

Now I’m doing an experiment on moving computations from CPU to GPU. For example, matrix multiplication: A x B can be easily done by C code running on CPU, but how can I dispatch such computation work to GPU and get the result from GPU ?

The environment:
HTC dev 1 phone, with Qualcomm MSM2701A chipset (GPU: Adreno 130)
OpenGL ES 1.4

I’ve tried writing a simple program to let GPU do the work, as below

====================== Part of code =============================
EGLDisplay m_eglDisplay = eglGetDisplay(EGL_DEFAULT_DISPLAY);
if (m_eglDisplay == EGL_NO_DISPLAY || eglGetError() != EGL_SUCCESS) {
return -1;

EGLint major, minor;
if (eglInitialize(m_eglDisplay, &major, &minor) == EGL_FALSE || eglGetError() != EGL_SUCCESS) {
            return -1;

GLfixed mantissa[16];
GLint exponent[16];
GLfloat matrix[16] = {  3, 3, 8, 8,
                                 5, 7, 5, 7,
                                 1, 2, 2, 2,
                                 4, 4, 4, 2};
GLbitfield status;

status = glQueryMatrixxOES(mantissa, exponent);


Codes above load a 4x4 matrix into current matrix, then glQueryMatrixxOES( ) should put the value into mantissa and exponent. After the program complete, I get these value:

status = 54

mantissa = {
-1342110592.000000, 0.000000, -1342109440.000000, -1342171008.000000,
0.000000, -1342110592.000000, -1342110592.000000, -1094222464.000000,
-1094222464.000000, -1094222208.000000, 0.000000, 0.000000,
0.000000, 0.000000, 0.000000, 0.000000 }

I want to know how to operate GPU as a general computing unit like CPU can do. Should I keep trying OpenGL/ES ? Any suggestion would be useful !


I very much doubt that OpenGLES uses the GPU to do math on the matrix stack. It’s almost certainly being done in the CPU still.

The GPU is for drawing pretty pictures. If you want it to do something else - you have to make it think that it’s drawing a pretty picture. Also, the overheads associated with setting up the GPU to do something and the complexities of reading the results back make it utterly useless for short, simple tasks like multiplying two matrices.

Now, suppose you had a million 3-vectors that you needed to cross-product with another million 3-vectors - to get a million 3-vectors as a result. THEN the GPU would be just the thing for the job.

(I’m going to dramatically over-simplify this explanation…the reality is much more complex…)

You’d load the first million vectors up into the red, green and blue “colors” of one 1024x1024 texture map and the second million vectors, similarly into another texture map - then tell the GPU to draw a polygon that covers a million pixels using those two textures - with a shader that would do the cross-product of the “color” at (x,y) of the first texture with the color at the same address in the second texture - writing the result out to a “render buffer” - which is (essentially) just a third texture. Then you could use a ‘readPixels’ call to read back the million results into the CPU.

The operation of doing all million cross-products would be lightning fast - and the overhead for all of the GPU setup, drawing commands, etc would be worth it.

Sadly, reading the results back into the CPU is going to be slower than you’d hope. So ideally, you need to do long chains of calculations, feeding the results “texture” from one calculation into the input texture of another…and in that way, you can make big savings.

But it takes some skill and often some lateral thinking to get it done.

You might also want to investigate OpenCL or nVidia’s “CUDA” library for doing GPU math - which let you right more “normal” looking code - and have the ikkiness hidden from you. But I doubt that those run on cellphone hardware…although I could easily be wrong.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.