POT & NPOT textures


Hey guys,

I have a difficulty in understanding the difference between the POT and NPOT textures. I’ve gotten very confused on this matter, and its quite frustrating, thats why i am posting here. First of all i am confused where a texture is loaded. Some say the “video memory” (is that the same as the RAM?) others say the GPU.
I am fairly new to OpenGL so the questions might seem stupid to you, and i might end up mixing things.

1.) A POT texture, as i understand, is an image that has dimensions of 2^n x 2^m which means 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048. This is because it has to fit the hardware structure - but why?? How is this image loaded, and where is it loaded in the RAM, CPU or CPU? I am a bit confused here.

2.) NPOT:
So what does an NPOT mean? Is it any image that can serve as a texture, no matter the dimensions, in width and height? Or how? A 200x200 (NPOT) texture image yields lower performance than a 256x256 pixel texture image? Why is that?
If I read a 200x200 texture into CPU, will I allocate space of 256x256 on the GPU, which means i am wasting space. Or how is this understood correctly?

I am probably mixing things up here…
help is much appreciated!

  1. In the normal case you load it into RAM first, and then using glTexSubImagexx you put a copy in the VRAM of the GPU. Depending on how many textures you load they may not all fit into VRAM and then the driver will manage that for you. But ideally they should be in VRAM. Most new HW will work with NPOT (Non-Power-of-Two) textures, which are not 2^n x 2^n.

  2. Your description is correct. As to what each driver does. That’s driver dependent, but most are fairly adept at making good use of your VRAM.

As for performance. Case by case, driver by driver, manufacturer by manufacturer, but generally it’s “safer” to stick with NPOT if you are unsure, or do some benchmarking.

  1. VRAM is Video RAM which is the memory on the GPU board. With glTex[Sub]Imagexx you send the image data to the GL driver, which will in turn send it to the VRAM.
    Note also that the driver often (always?) keep a copy in RAM, in case the VRAM is filled and this texture is swaped out. Then when when texture is needed again, driver can transparently send it again to VRAM. The transfer means some performance drop of course.

To answer the “why” part of the question, this is due to the original mipmapping technique presented in paper “Pyramidal parametrics” by Lance Williams, Computer Graphics, ACM, Vol 17, Number 3, 1983.

page 3:
“The concept behind this memory organization is that corresponding points in different prefiltered maps can be addressed simply by a binary shift of an I input U, V coordinate pair. Since the filtering and sampling are performed at scales which are powers of two, indexing the maps is possible with inexpensive binary scaling. In a hardware implementation, the addresses in all the corresponding maps (now separate memories) would be instantly and simultaneously”

Note that if you target some old OpenGL implementation, the glu library provides the function gluScaleImage() to convert your NPOT image to a POT image, ready for old OpenGL implementations.

ref: http://www.opengl.org/documentation/specs/glu/glu1_3.pdf

If you target an OpenGL implementation>=2.0 or an OpenGL implementation exposing extension GL_ARB_texture_non_power_of_two, you don’t need to perform any kind of size conversion.

ref: Spec 3.1, appendix H, section H.3.29 Non-Power-Of-Two Textures:
“The name string for non-power-of-two textures is GL_ARB_texture_non_power_of_two. It was promoted to a core feature in OpenGL 2.”

ref: http://www.opengl.org/registry/doc/glspec31.20090324.pdf
ref: http://www.opengl.org/registry/specs/ARB/texture_non_power_of_two.txt


I just noticed I suggested sticking with NPOT in my post. I did of course mean POT.

And of course as ZbufferR correctly (and subtly) pointed out there is glTexImagexx and glTexSubImagexx. The former for uploading a full texture, the second for subsections of a texture. Sorry for my “typos”, it was along day yesterday! Duh!


This is gold! thanks guys! i understand it much better now.

As for the mipmapping, which i understand is a filtering technique derived from minification, i am not sure i totally understand. Lets take a simple example:
A texture thats glued on any sort of a polygon, that moves further and further away from the viewer, and becomes so small that it can be represented by just one screen pixel, it becomes a necessity to read all texels and combine their values correctly to determine the color fragment ( color fragment = screen pixel, right?). This an expensive operation for the GPU&RAM, and thats why mipmapping is enabled. But as described from the abovementioned link:

Mip mapping supplements
bilinear interpolation of pixel values in
the texture map

  • are these meant as texels? interpolation of texels?

(which may be used to
smoothly translate and magnify the tex-
ture) with interpolation between prefil-
tered versions of the map (which may be
used to compress many pixels into a small
~ L L

So the farther away the texture is, it is replaced by smaller interpolated versions of the same texture? Or how is this understood?

thanks again for the help!


search for “Multiple Levels of Detail” here, and read it :

quick summary :

  • using mipmapping means you precompute a chain of smaller images based on the original image. Each level is 2 times smaller in width and height, meaning it has 4 times less texels. Level 0 is the base image. All levels down to a single texel image need to be defined. That is why it is simpler for POT, as explained by Overlay.
  • then at runtime, the card selects the adequate texture level to sample : for a 512512 texels texture mapped to a 256256 pixels quad, it samples level 1.

What happen when the ideal levels is between 2 others ? In old school bilinear filtering (GL_LINEAR_MIPMAP_NEAREST), it takes the nearest level. The transitions are somewhat jumpy. With trilinear filtering (GL_LINEAR_MIPMAP_NLINEAR), it samples both levels, and averages the result with higher weight to the closer lever, so the transitions are smooth.



So the farther away the texture is, it is replaced by smaller interpolated versions of the same texture? Or how is this understood?

You’ve got it. For the basic “why use MIPmaps” concept, please don’t think there’s something super subtle in here. What you said before is correct. As an object gets further and further away, there are more texels in the base map behind each screen pixel that need added up and averaged (i.e. integrated), to the point where you’ve got the entire bloody texture behind one pixel and you have to add every texel up and compute an average.

But with MIPmaps, you can use some “pre-averaged” (i.e. pre-integrated) smaller-res versions of the texture to save a lot of that averaging work. To the point where if the entire texture is exactly behind a single pixel, you can just use the 1x1 MIPmap level for your answer and be done with it.

I’m glossing over some subtleties but in a nutshell that’s it!

Think of it as a “texels per pixel” question. When rendering:

  • When you have 1 base map texel per pixel, you (the GPU) uses the base map (MIPmap level 0).
  • When you have 2 base map texels per pixel, you use MIPmap level 1 (which for MIPmap 1 would be 1 texel per pixel).
  • When you have 4 base map texels per pixel, use MIPmap level 2 (which for MIPmap 2 would be 1 texel per pixel). etc.

Seeing a pattern here? The job of the GPU when using MIPmaps is to filter out an approximately 1 texel/pixel result from your texture.

When you completely gel this, I’m sure you’ll think of other questions. But just keep the above mind as a mental model (a jumping off point). Follow-on topics you might be interested in: texturing surfaces viewed edge-on (anisotropic texture filtering), vendor filtering cheats.