This claims POTT still makes sense performace-wise (in OpenGL):
Oh, that’s interesting, even if I don’t see why this issue is especially related to float number interpolation
"Because interpolation of float numbers can be done very quickly with power-of-two textures, these textures will render faster than ones that are not a power of two. […]”.
Maybe I’m just to tired to understand, or the writer was, or he/she just meant the interpolation itself (min/magFilter != nearest) rather than the value type that is interpolated.
Anyhow, good to know. And also a reason more measuring and comparing performance. Even if it’s mentioned that this is more related to older Intel GPUs (which surely are still widely used).
Maybe I’m wrong, but I always had the feeling that Intel GPUs and their drivers sometimes are something “very special” ;), compared to the market leaders AMD&NVidia. Not to mention even more exotic mobile GPUs, where every transistor counts so a mobile phone won’t turn into a radiator. Probably that’s just the nature of things (and money).
Following the Intel link and using the code sample, I’m wondering that even on my desktop GPU there is a difference of ~3% in performance:
OpenGL renderer string: GeForce GTX 1050 Ti/PCIe/SSE2
OpenGL version string: 4.5.0 NVIDIA 376.33
This lesson compares the read performance between using Power-of-Two textures and Non-Power-of-Two textures.
Press <esc> to exit; <space bar> to switch between texture sizes …
*** Non-Power-of-Two Texture – 640 x 426
frames rendered = 11166, uS = 2000089, fps = 5582.751568, milliseconds-per-frame = 0.179123
frames rendered = 11206, uS = 2000035, fps = 5602.901949, milliseconds-per-frame = 0.178479
*** Power-of-Two Texture – 1024 x 1024
frames rendered = 10964, uS = 2000122, fps = 5481.665618, milliseconds-per-frame = 0.182426
frames rendered = 10969, uS = 2000070, fps = 5484.308049, milliseconds-per-frame = 0.182338
The big question is whether and if how much performance loss is caused by the Open GL driver. Luckily, there is a new API with an extremely low driver overhead that can be used to find this out
Another issue (which I mismatched in my last post) is the difference between rendering to such (“framebuffer”/”renderable”) textures, sampling/fetching from it, and last not least resolving such in case MSAA is used. Not to mention the case if mip-mapping comes into play. But the ladder is, fortunately, not related to my topic and IMHO not recommended at all.
So, many assumptions are made to this point, and few really reliable measurement data or at least rules of thumbs for real life cases. As soon I found out more, I’ll let you know (need to install the Android dev tools for VS and port my code to support Android in order to be able to let you know at least how a Snapdragon=Adreno GPU behaves dealing with non-POTT.) Of course, I would be happy if you have any further info regarding this topic to share.