Can I say that the computation time for a CUDA program is determined by the longest time among all threads?
For example, there are two algorithms A and B that complete the same task.
Using A requires fewer parallel threads running simultaneously but each thread takes longer time than those with B. So I should choose B because of the above assumption.
Am I right?