I don’t hang out at stackoverflow.
The first question is “are you comparing apples with apples?” That is are the images in the KTX and DDS files compressed the same way? I suspect the answer is no and that you are comparing a DDS file containing images compressed with one of the DXTC variants against a KTX file containing images compressed with ETC. I have been told by the NVIDIA OpenGL driver team that the Quadro 4000 does not support ETC in hardware while it does support DXTC. This means the ETC-compressed images will be decompressed by the OpenGL driver in software then loaded into GPU memory while the DXTC-compressed images will simply be loaded into GPU memory. I believe that is the source of the performance difference you are observing.
To truly compare the performance of the DDS and KTX file formats you should create a KTX file containing DXTC-compressed images or a DDS file containing ETC-compressed images. I do not know if the latter is possible. Unfortunately toktx does not support converting DDS to KTX at this time. However the source is available and the underlying ktxWriteKTXF function in libktx accepts data in any format known to OpenGL so it would not be difficult to add the feature.
In the interest of making ETC ubiquitous, the working groups made a concious decision to let older hardware provide support with software decompression with the understanding that it could lead to poor comparisons like this.
Where is the needless branching in the libktx code that you stripped off?