24 scalar mul/madd instructions?
There’s supposed to be some kind of specialized mechanism using swizzles that makes vector/quaternion transforms take only 2 opcodes (in a vector shader).

That are more than 2 opcodes. Sometime quaternions looking very cheap but that is not true, because each quaternion multiplication needs 16 scalar multiplications.

I’ve wrote 24 mul/madd instruction because a quaternion multiplication can be represented as 4x4 matrix * vector multiplication. A optimization is to remove all part that are multiplied zero or that sum relult in zero. After removing that unneeded valuesit a 3x4 and a 4x3 matrix, both could be multiplied together to a 3x3 matrix (quaternion to matrix conversion), but (matrix * matrix) * vector is slower than matrix * (matrix * vector)

That are more than 2 opcodes. Sometime quaternions looking very cheap but that is not true, because each quaternion multiplication needs 16 scalar multiplications.
Never mind; I was thinking about the vector cross-product 2-opcode version.

Quaternion used for rotations are basically directions.
Quaternions are not directions; they are orientations. A quaternion represents a unique orientation of an object.

They can represent orientations relative to the identity quaternion as well.

A quaternion is no more a direction than an angle/axis rotation or a rotation matrix. It only becomes a direction when you ask the question, “Where would this vector go if you rotated it by this quaternion?” And that question can already be asked with vector/quaternion rotation.

OK bad description. It’s no problem to convert a quaternion to a matrix. Sometimes, it isn’t needed to calculate the full matrix, because only the 3rd vector (Normal) is needed. Calculating this vectors directly is faster than the rotating function.

Converting a vector to a quaternion is not possible without a second vector, because a quaternion describes a rotation around a vector. (One value is missing).

Sometimes, it isn’t needed to calculate the full matrix, because only the 3rd vector (Normal) is needed.
Forgetting the question of when these times are, the “third vector” of a matrix does not have a name. You can give it a name, but only because your specific application gives that particular vector meaning.

As such, this is not a very generalized use of quaternions. Furthermore, you can get this value easily enough by rotating the unit z-axis by the quaternion.

I saw a paper once that suggested using quaternions instead of normals for per-pixel lighting. That is, the vector shader will output a quaternions instead of a vector and teh pixel shader will turn the interpolated quaternion back to a vector. They claimed this will result in a better image quality (If I got it right, I just flew it over).

That is, the vector shader will output a quaternions instead of a vector and teh pixel shader will turn the interpolated quaternion back to a vector.
For pure per-pixel, that might work. However, bump mapping requires a matrix, as the normal/binormal/tangent triple is not necessarily orthogonal to one another.

I’m not really sure what you’d get for using a quat, though. The only approximation that per-pixel lighting makes is linearly interpolating the normal between the 3 vertices. A quaternion could be SLERP-ed between the 3 verts, but that would require a specialized interpolated. Otherwise, I don’t see what this buys you in terms of image quality.

In many cases SLERP is an overkill. Examples with meaningful used quaternions are BRDFs combined with bumpmapping or deferred rendering with BRDFs.
The problem with BRDFs or other advanced lighning modles is, that some vectors have to be projected into the texture space. In all that cases the needed 3x3 rotation matrix has some disadvantages:
For deferred shading are 6 components (3 can be drooped) for a matrix much more bandwith intensive than 4.
for Bumpmapping with BRDFs the normalmap wouldn’t be enough, an additional Tangentmap would be required to create a new full texturespace. A quaternionmap is much easier to handle.
Another gain is that quaternion based bonesystems needs less uniform variables than matrix based systems and the quaternion multiplications are cheaper than the (sometimes hidden) matrix multiplications.

Actually I find it much cheaper to project the normal to the world space. Or even better, quit using the normal maps, they are a hack anyway. I like perturbation maps for displacing the normals

For deferred shading are 6 components (3 can be drooped) for a matrix much more bandwith intensive than 4.
As I pointed out, the NBT matrix for transformation into texture space is not necessarily orthonormal. As such, it cannot be expressed as a quaternion.

Another gain is that quaternion based bonesystems needs less uniform variables than matrix based systems and the quaternion multiplications are cheaper than the (sometimes hidden) matrix multiplications.
I’m not against using quaternions at all. I was specifically talking about what it means for a quaternion to replace a normal.

Or even better, quit using the normal maps, they are a hack anyway.
A “hack”? No more than “perturbation maps”. Indeed, normal maps are less of a hack than that; at least they’re modelling something real.

Actually I find it much cheaper to project the normal to the world space. Or even better, quit using the normal maps, they are a hack anyway. I like perturbation maps for displacing the normals
I find that it’s cheaper to do calculations in tangent space because you do some more work in the vs and pass varying to the fs.
It balances out better.

How is pertubation map better?

They are all hacks.
The true solution is to use a high detail model. Then your depth values in your z-buffer will be more correct and even your physics will be more accurate if you use the same high detail model.