Completely ignoring the question of transposing the matrix, which is supposed to be faster: left-multiplying a vector or right-multiplying it, relative to a matrix?

I know this is something of an implementation detail. And scalar-based shader systems (G80 and above) generally don’t care. But what’s the answer for vector-based hardware?