Why do texture HEIGHTs have to be powers of 2?

I was wondering this. A general guideline seems to be to create textures whose width and height are powers of 2, since some 3D cards like them that way. But why does the height need to be a power of 2? A texture is read like this:

(assuming 1 << BPPShift == BytesPerPixel and 1 << WidthShift == TextureWidth)

DWORD ReadTexture ( X, Y )
{
return Data[(Y << (WidthShift+BPPShift)) + (X << BPPShift)];
}

The height of the texture is not needed at all, so why should it have to be a power of 2?

Because that’s the way the OpenGL specification is written. It isn’t a “rule of thumb” or anything. If a video card accepts a texture that is not a power of two in both dimensions, plus an optional border, it is not being compliant with the OpenGL specification.

And reading a texture with a bitshift is one of the fastest ways, but not the only way. You never know how an implementation will do it.

If for some strange reason a graphics card stored textures in vertical instead of horizontal scanlines, this scheme would not work. (Not that I can think of any reason to do this, but you never know)

I’m inclined to think that making textures required to be powers of 2 in both dimensions is just to be consistent. If they made it so that heights can be anything, people would be asking why the width was restricted to powers of two.

j

Probably to make it easier for hardware implementations.
If there was no restrictions it would mean harder/more expensive
hardware.

But (not sure if you’re interested), it is the allocated texture buffer
that needs to be powers of 2. The used space of the buffer has by
no means to be restricted to powers of 2 (it might improve performance).

glTexSubImageXD can take any region, I use a 1024x1024-texture
to download 720x576 video-frames. The unused space can be used
for loading other images. You don’t want to waste texture memory really.

/ Patric

Have any of you ever written your own perspective correct texture mapping renderers? Back in the olden days you didnt have this fancy OpenGL thing… or at least it wasnt fast enough because there wasnt such a thing as a consumer grade 3d graphics accelerator…
Whats that have to do with the question you ask? Simple… the math for perspective correct texture mapping is ALOT easier if you restrict both the width and the height to powers of two.
To be perfectly honest I’ve never written my own perspective correct texture mapping renderer either, but I did read many texts on the subject.
Have a look at one of the ZILLION of 3d graphics programming books (not the ones that talk about DirectX or OpenGL). They’ll tell you the whys and what-fors. But I do know that the math is simpler with powers of two.

Suggestions:
Michael Abrash’s Black Book of Graphics Programming.
The Graphics Gems series.
Computer Graphics: Principles and Practice (a.k.a. ‘The Graphics Programmers Bible’).

well, I have written a perspective correct texture mapping function, it was a couple of years ago but I can’t remember that I have read anywhere that the height need to be 2^n (to gain speed). please explain, i would really like to know.

I saw a very fast software texture mapper. Did like 1 texel every 2 clock cycles or somat. It only worked with textures of size 256x256. Some nifty tricks with 8bit registers, and addressing were the reason.

If a dimension of a textue is a power of 2, it makes it much easier and faster to do multiplication lookups into it, by using shifts rather than multiplies. For example to get to the beginning of the 30th scanline of an 8bit texture thats 64pixels wide, you can do.

30 <<= 6;

rather than 30 *= 64;

  • Nutty.

I can think of at least 5 or 10 good reasons to require that textures be powers of two in HW.

  • Matt

Originally posted by mcraighead:
I can think of at least 5 or 10 good reasons to require that textures be powers of two in HW.

Matt, you are always such a tease. You know that, right?

Heh, we’re getting off subject. I was talking specifically about the height, in my example I did use bit-shifting for the width. But although I’ve programmed a software renderer, it uses affine texture mapping, so I didn’t know perspective-correct mappers needed to multiply/shift by the height of the texture along with the width.

Actually, I’m constantly worried that I’m posting too much information.

Here’s one of the reasons: texture wrapping (clamp, repeat, clamp to edge, etc.) computations are far easier when you know that the texture is a power of two. No nasty modulo hardware, just nice bit arithmetic.

  • Matt

Originally posted by mcraighead:
[b]Actually, I’m constantly worried that I’m posting too much information.

Here’s one of the reasons: texture wrapping (clamp, repeat, clamp to edge, etc.) computations are far easier when you know that the texture is a power of two. No nasty modulo hardware, just nice bit arithmetic.

  • Matt[/b]

IMHO, there is no such thing as too much information, especially where GL is concerned

Originally posted by mcraighead:
[b]Actually, I’m constantly worried that I’m posting too much information.

Here’s one of the reasons: texture wrapping (clamp, repeat, clamp to edge, etc.) computations are far easier when you know that the texture is a power of two. No nasty modulo hardware, just nice bit arithmetic.

  • Matt[/b]

But isn’t the texture coords interpolated using floating point math? And aren’t the clamp/repeat stuff done before multiplying with the width of the texture?
When you get a texture coord of say 1.7253 and have GL_REPEAT you’d only need to cut off the integer part to get 0.7253 (don’t know how expensive that is, but should be way cheeper than modulo) and than multiply with the width.

The texture coords may or may not be interpolated in floating point. It doesn’t matter.

Sure, wrap is easy (take the fractional part), but how about clamp to edge? You need to clamp the coordinate to [1/(2dim), 1 - 1/(2dim)]. Very nasty if dim is not a power of two.

If you are in fixed point, wrap, clamp, and clamp to edge are all easy for power of 2 textures.

If you are in floating point, you would first convert to fixed point and then perform the fixed point calculations. (You need to reduce to fixed point to do address calculation and filtering, so you might as well do it at this point rather than doing nasty wrap calculations in FP.)

  • Matt

> But isn’t the texture coords interpolated
> using floating point math? And aren’t the
> clamp/repeat stuff done before multiplying
> with the width of the texture?

It needs to behave “as if” it did this.

Hint: converting to fixed point earlier
allows you to build cheaper and/or faster
hardware at any given transistor count.

Also, if a texture was power-of-two sideways
but not heightways, what would happen if you
used your u/v mapping to rotate the texture?
Either you force it both ways, or no way.
GL opted for the performance-inducing choice.

Originally posted by mcraighead:
Sure, wrap is easy (take the fractional part), but how about clamp to edge? You need to clamp the coordinate to [1/(2dim), 1 - 1/(2dim)]. Very nasty if dim is not a power of two.

What about a 2048 entry look up table with all values for 1/(2*dim)?
Don’t know how much it would cost in transisitor count, but should be possible. It would maybe even be possible to use a smaller one and interpolate linearly.

Originally posted by Humus:
What about a 2048 entry look up table with all values for 1/(2*dim)?
Don’t know how much it would cost in transisitor count, but should be possible. It would maybe even be possible to use a smaller one and interpolate linearly.

Yes, but two things. First of all, a simple bit shift is going to be faster than a lookup table. Second of all, Matt said he could think of 5 to 10 reasons (and I’m sure there are many reason he doesnt even realize). He posted one, and you are proposing a workaround for this issue. Then when he posts the next reason, are you going to find another workaround for that one, and then the next, etc.

You see, when you try to “hard code” a solution to one problem, by the time you are done, you often realize that you “hard coded” half the stuff, and the general solution would have actaully been simpler, easier to maintain, and sometimes even faster. I think this is one of those cases where once you get to the meat of the problem, you will find the general solution is all of the above (the general solution being to just have some restrictions on the textures).

Originally posted by LordKronos:
[b] Yes, but two things. First of all, a simple bit shift is going to be faster than a lookup table. Second of all, Matt said he could think of 5 to 10 reasons (and I’m sure there are many reason he doesnt even realize). He posted one, and you are proposing a workaround for this issue. Then when he posts the next reason, are you going to find another workaround for that one, and then the next, etc.

You see, when you try to “hard code” a solution to one problem, by the time you are done, you often realize that you “hard coded” half the stuff, and the general solution would have actaully been simpler, easier to maintain, and sometimes even faster. I think this is one of those cases where once you get to the meat of the problem, you will find the general solution is all of the above (the general solution being to just have some restrictions on the textures).[/b]

Exactly. I would also note that it would NOT be cheap to use a reciprocal lookup ROM. On a GF2, you need 16 of them – 4 pixels, 2 textures, 2 texture coordinates. Each table must be big enough to handle [1024,2047] at minimum. If you start thinking about future generations of chips, this is not a scalable solution. All four numbers I mentioned (pixels, textures, texture coordinates, and table size) could grow in the future.

Graphics HW is all about cutting corners to make powerful rendering technology cheaper than it “should be”. That’s what separates it from the CPUs of the world – a more constrained problem domain, and therefore a more efficient solution.

  • Matt

Yes, I understand there are many reason why to restrict the sizes to powers of two and I personally haven’t met any situation where other sizes would be needed of motivated.
But since many cards today can use textures of other sizes, does that mean that there are restrictions with those, such as you can’t do clamp_to_edge with such textures?

Originally posted by Humus:
But since many cards today can use textures of other sizes

Im not disputing your claim, since I havent used any non-nvidia cards for openGL, but which cards are you talking about? I wasnt aware that any allow this.

Possibilities: Perhaps those cards take a performance hit for allowing you to do so. Or maybe the driver resamples the image to a power of 2 when you upload.

[This message has been edited by LordKronos (edited 02-08-2001).]

I should, for purposes of accuracy, say that we do support non-power-of-two textures in HW, and they do support clamp to edge. Instead, they don’t support wrap (as well as a few other things).

This feature will be exposed in our next major driver release.

  • Matt