Multiply 2 big matrixes using OpenGL

hadi_hadizadeh · July 2, 2006, 12:32pm

Hi,Is it possible to multiply 2 big matrixes using OpenGL as fast as possible? In the other word, I want to do so by using GPU. Thanks

ZbuffeR · July 2, 2006, 12:42pm

It is not the intended use, especially if you want precise results.
Anyway, here is a good site about general “General-Purpose computation on GPUs” :
http://www.gpgpu.org/
http://www.gpgpu.org/wiki/FAQ

hadi_hadizadeh · July 2, 2006, 1:02pm

Why do you think it is not intended use? In fact, I want to multiply 2 double matrixes faster than writting a simple and normal code. Someone says that OpenGL can be used as an interface to GPU. Is it true?

ZbuffeR · July 2, 2006, 1:16pm

Of course it is possible, and for GPGPU computations OpenGL is a good API to use, see the FAQ link I posted for more details.

What I mean is that graphic accelerators are not intented to be used as high performance scientific clusters, basically you will have to trade off precision and flexibility to get cheap processing power.

EDIT : ie. it is already hard to get float precision, for double it will be even harder.

hadi_hadizadeh · July 2, 2006, 1:32pm

OK, By using OpenGL as an Interface to GPGPU programming, I think there is no need to tackle with GPGPU directly.I mean that we write our codes in OpenGL and this is the OpenGL who runs the codes in GPU. Is it true? If so, could you please give me an example to show how can I multiply 2 matrixes in OpenGL? Thank you very much!

ZbuffeR · July 2, 2006, 1:48pm

Read The Links.

Jan · July 2, 2006, 3:48pm

RTFM
What is your exact definition of “big”? Don’t tell me it’s 4x4.

Mars_999 · July 2, 2006, 6:39pm

Originally posted by Jan:
[b] 1. RTFM

What is your exact definition of “big”? Don’t tell me it’s 4x4. [/b]
1.OUCH

Flavious · July 2, 2006, 8:52pm

The links that Zbuffer provided are quite good.

Anyway, to my knowledge there is no native support for matrix multiplication beyond 4x4. However, matrix multiplication is inherently vectorizable, meaning it is possible to compute the product of matrices as the product of sub-matrices or vectors.

For instance, a matrix-vector product can be vectorized as 4 vector scales and 4 vector adds. A matrix-matrix multiply can likewise be coded as 4 matrix-vector multiplies. An NxM-MxQ multiply can be perhaps optimally coded as combination of 4x4 sub-matrices. For starters, I would try to decompose the input matrix into a blocked matrix consisting of 4x4 sub-matrix entries. For example, given 2 8x8 matrices P and Q,

PQ = ( A B )(E F) = (AE+BG AF+BH)
     ( C D )(G H)   (CE+DG CF+DH)

where A, B, C, D, E, F, G and H are each 4x4 sub-matrices of P and Q.

If your matrix dimensions are not already multiples of 4, consider padding them with zeros as necessary.

At any rate, the trick to computational speed with GPUs is vectorizing your operations. Bear in mind this speed has to be weighed against time needed to transfer the operations to and from the GPU, assuming that you need the results in you app.

Alternatively, SSE or 3DNOW! intrinsics could be leveraged in place of GPU processing, scenario permitting. You may find an introduction to these intrinsics helpful in understanding the general concepts in SIMD architectures.

I hope this helps.

hadi_hadizadeh · July 3, 2006, 5:40am

I want to multiply a 50000x9 matrix by a 9x9 matrix. Now, is it possible to do it in GPU? If so, is it possible to do it by using OpenGL?

hadi_hadizadeh · July 3, 2006, 5:47am

Also, what is the RTFM?! or OUCH?!

Zengar · July 3, 2006, 7:06am

Look here: http://www.gaarde.org/acronyms/

OUCH means just ouch

It is possible to do it onGPU (potentially), why not? But it would be very difficult to get good precision, because GPUs usually operate on less then 32-bit precision, so you will need some special methods (no idea what, I always skipped my numerics lectures :-o ) if you want double precision.

Besides, you’ve got you link, didn’t you? I am not shure that anyone here ever did something similar - too much effort. In the end, it is up to you.

I think that it is not worth the effort. You’ll spend weeks trying to get good precision and in teh end your software will propably be overly compicated, unstable and extremely hard to debug. I doub’t that you will get significant performance increase this way. Stick to SSE on CPU, this is my advice.

BTW, why do you need such huge matrices oO? If not a secret…

imported_UrbanLegend · July 3, 2006, 9:08am

RTFM = Read the Flipping Manual ( the F can be replaced with other words )

I don’t think you will get any benefit out of using the GPU here

SSE is the way to go IMO, here at my current company we have our own SSE implementations that are significantly faster then C++ or Intels fast math library

hadi_hadizadeh · July 3, 2006, 9:26am

Many thanks for your replys. I am working on a Real-Time Image Processing Task, and I need a fast matrix multiplication procedure because the bottleneck of my algorithm is it. My Programming Language is Delphi. Do you have any idea or know any way to use SSE in Delphi?

Zengar · July 3, 2006, 2:08pm

From Delphi 2005, SSE instructions are supported by the assembler (they are there in 2006 for shure). You can also use freepascal. However, all SSE code has to be written per hand then. If I am not mistaken, AMD has a free math library utilising 3dnow! and SSE instructions, look for it on their developer site. Unfortunately, I know no tutorials on SSE, so I can’t help you there. If you are new to assembly, you will have rough time though. It is still better then using GPU, IMHO. Just google or ask the questions on Intels/AMD developers forums, you will get some good advice I guess.
And, BTW, don’t expect good performance. Even with very good optimisations, this will probably take hours to compute. Your best option would be using some specialised scientific hardware, or a Cell-like CPU

Overmind · July 3, 2006, 2:49pm

RTFM = Read the Flipping Manual
I prefer “Read the Fine Manual”

Even with very good optimisations, this will probably take hours to compute.
Huh? 50000 * 9 * 9 == 4050000.

4M multiplications should run in under a second, and that’s without even the most basic optimisations.

As for gaining real time performance, I think the best bet is to invest in parallel execution (multiple CPUs or even a cluster). The standard matrix multiplication algorithm is inherently parallel, every component of the result is computed seperately.

Sure, SSE will help a lot, too, but I’m not sure if it will be enough…

dom_unido · July 3, 2006, 3:05pm

I answered on GPGPU.org in case someone is interested.

http://www.gpgpu.org/forums/viewtopic.php?p=12283#12283

The AMD/Intel math libs are really really slow compared to Atlas or GotoBLAS btw.

Bob · July 3, 2006, 5:54pm

Originally posted by Overmind:

[quote]Even with very good optimisations, this will probably take hours to compute.
Huh? 50000 * 9 * 9 == 4050000.

4M multiplications should run in under a second, and that’s without even the most basic optimisations.
[/QUOTE]Think someone is really underestimating the power of todays computers. Timing a matrix multiplication of that size in MATLAB tells me it’s done in about 10 milliseconds. Got an AMD64 3500+, so it’s fairly modern, but still nowhere near “hours”.

Zengar · July 3, 2006, 10:04pm

Wow, I thought that the computational overhead would be much higher oO
I guess it is because I have non-explainable fear of large computations. Somehow I always think that the operations will go up in exponential manner embarassed
Well, nvm

hadi_hadizadeh · July 3, 2006, 11:39pm

Zengar said that Delphi 2005 support SSE internally. If it is true, then I think I can compile my codes in Delphi 2005 and so there is no need to tackle with the SSE instructions myself. But do you agree? In my current code , that matrix multiplication of that size takes about 100msec on my Pentium 4 (Celeron,2.8GHZ,512MB) and I am wondering how Matlab can do it in about 10mesec!! As you know, Matlab is very slow in regards to the native programming language codes since it is a utility! Do you suggest me AMD processors or Intel ones for this purpose?