At one time, I heard NVIDIA claim that GeForce 2 and up support signed short, in addition to float, as long as the start of each tuple sat on an 4-byte boundary. Thus, you could pack a normal into three shorts, but you’d have to waste the fourth short, because it wouldn’t be properly aligned. U, V packs well into two shorts; two U, V pairs pack into four contiguous shorts (I seem to recall).
This could actually be tested, by putting data into an NV_vertex_array_range, setting up the pointers, and testing whether the array range is valid or not. We did that way back when, and I seem to remember that the results are as reported above.
Additionally, NVIDIA supports unsigned bytes (still 4-byte aligned) for colors; in fact, that’s the preferred representation (unless you’re doing floating point precision stuff and need more).