fast vector math and SIMD

i’m using linux’s g++ as a compiler and i’m trying to find a way to get cross-products of vectors very fast. i understand that if you use intels SIMD and assembly code it will be much faster. the problem is that the gcc compiler needs at&t assembly and all i know to use SIMD is intel assembly. Can i use SIMD in linux? can the gcc compiler somehow take intel assembly? are there any other ways of doing vector math this fast? any help is appreciated. Incus

the difference between at&t and intel asm is just the notation.


[This message has been edited by opla (edited 03-07-2002).]

i’ve tried to use intel2gas but it doesn’t understand the SIMD instructions. the code that i want to run in my program is:
v1&v2 are vectors

mov esi,v1
mov edi,v2

movaps xmm0,[esi]
movaps xmm1,[edi]
movaps xmm2,xmm0
movaps xmm3,xmm1

shufps xmm0,xmm0,0xc9
shufps xmm1,xmm1,0xd2
mulps xmm0,xmm1

shufps xmm2,xmm2,0xd2
shufps xmm3,xmm3,0xc9
mulps xmm2,xmm3

subps xmm0,xmm2
mov esi,this
movaps [esi],xmm0

this is the intel version and the code should do cross-products very quick. can i use this code in linux? Incus

have a look to

thanks but thats nt quite what i’m looking for. i now know that i can use SIMD but i’m not sure how to convert my intel assembly to at&t assembly. i don’t know what the SIMD at&t commands are such as movaps, etc. Incus

Can’t you just use NASM and link the resulting object files?

Or the Intel Compiler for Linux ?
(commercial or non-commercial)

i don’t know let me try.

Yes, the NASM compiler will work. Just compile with the flages -f elf and this will create the correct object file for you to link with.

Neil Witcomb

i’m a little confused. i got nasm and set it up. i have two files. one is a .cpp file and the other is a .h file which is included with the .cpp file. the .h file contains the vector class which has the assembly code.i use

g++ -c -S filename.cpp
(produces a .s version of the .cpp)
nasm -f elf filename.s
(when i run nasm i get lots of errors though) am i doing this the wrong way? any help is appreciated. Incus

You do know that NASM is an Assembler right?

ummm yeah why. isn’t a .s file a assembly file? i have assembly code so i need an assembler. i convert the .cpp to .s then try to use nasm. it was my understanding that the -S flag with g++ produces a assembly file. i know it doesn’t have the .asm extension but .s would compile with the as assembler fine. am i wrong and if so where.

[This message has been edited by incus (edited 03-07-2002).]

anyone have any advice???

Nasm uses a different syntax than GAS. You can’t use it as a replacement. Instead, you have to write some functions all in NASM, and compile from .nas to .o using nasm, and then call them from your C++ which you compile from .cpp to .o using gcc.

If all you’re doing is a single cross product, SIMD won’t be very helpful. SIMD helps a lot when you’re ripping through large quantities of something-or-other, because you can get computational parallelism, and because you can do better cache management and avoid pollution or spurious write allocation.

I don’t want to be rude but you may find other forums where people will be able to help you better (i.e. this is an OpenGL forum, not a SIMD one…).



Ok, I can show you what I have done. I added a new rule to makefile as follows:

ASM_SRCS = shade.asm
ASM_OBJS = shade.o

.asm.o : rm $@
$(ASM) -f elf $*.asm

Then I just link all the objects to create the application as normal with all my object files as follows:

$(CC) $(CFLAGS) $(ALL_OBJS) $(LIBS) -o $(PROG)

No, my source code is very similar to the normal assembler you would create.

segment .text
global shade
… (You fill in the code)

You may find that makefiles will help out the linking process. If you would prefer some simple command line stuff, then here goes that as well.

nasm -f elf shade.asm
gcc -c main.c
gcc main.o shade.o -o render
Note that NASM uses the .asm extension and not the .S, although I find it hard to believe that would cause any problems.
I don’t know what else you would need to know, if you need to know more then you probably should be looking as some of the links that were posted. Another comment, that was just mentioned above. You will not see any improvement in performance for single operations like a simple cross product. The time that you should be using SSE and custom assembler is when you will be doing many operations in which case you should be writing the entire routine and not just some simple math operations.

You may find it much much easier just to use the Intel compiler, which will allow the code to be optimized for MMX/SSE/SSE2 depending on flags, I have found this to be a nice and simple approach.

Hope this helps,
Neil Witcomb