maximizing performance


I have a GTX570M (notebook).

By profiling, i seem to find the following specs:
336 cuda cores
7 SM Streaming Multiprocessor

As each SM has an x amount of cuda cores I calculated the number of cuda cores per SM to be 336/7=48.

However, documentation out there seems to suggest that the fermi architecure has 32 cuda cores per SM. So am I calculating incorrectly? Why do I get 48 instead of the expected 32?


Your calculation is correct: the GTX570M uses the GF114 architecture, which has 48 cores per SM, while the GF110 (the original Fermi architecture) has only 32 cores per SM.