Pointer to Framebuffer

Thanks, Matt!

So, where if not here are you planning to get into DrawPixels?

Well, in the end, I really won’t be using OpenGL for my applications, and the reaction and endless negative posts has sealed it for me.

Although my Assembly language programming of late is mainly reserved for speed critical areas, I have indeed done the optimizations you mentioned. And I diagree vehemently against the idea that optimizing Assembly is

thoughless

…thoughtless work. I have actually optimized the BTC opcode to be twice as fast as Intels hardware implementation. I’d love to see you do it using your ‘rules’.

I never made any mention of where I would use my optimized assembly and yet you seem to presume where I will. As the program does the calculations at the same time as storing it to video ram, your entire theory of using the parallel nature of the card is moot. Why then I would need to wait until I was finished calculating and filling my buffer to swap it to the screen. Maybe those things happen instantly.

[sarcasm]You know ET3D, you’re right, just because I can’t afford a video card, I can’t complain. What was I thinking. Only those with good hardware setups have that right.

Yeah, those are some good comparisons John. A printer, and a harddisk. Yeah, be a dear and get me pointers for those too while your at it.

VESA was unfortunately not as widely adopted as it should have been, but yes, I had learned it and hoped it would be adopted. Thank god we have advanced to the point where hardware determines what we need. Now the people selling the cards get to determine our fate.

Shared resources are nothing new, and yet still…some applications DO mess up other programs. You’d think with all the restrictions in place, nobody would have gotten through, so the obvious solution is more restrictions.[/sarcasm]

I am over assembly and only used it as an example of a program that utilized software means to provide the speed, and yes Assembly is dead.

[MORE sarcasm]Gee, I’ll bet you all can’t wait till they make CISC chips that process C commands and they completely do away with Machine code/Assembly all together. Of course C would then become just as unused because it would be too hard and too unstable.[/MORE sarcasm]

And lastly, this is a forum for posting suggestions for future implementations of OpenGL, however it seems to me more a bashing of ideas presented by others. One thing I have noticed on this board as a whole. Are there nice people on this board? Lot’s, and I have gotten a lot of great help from users on this board, yet there seems to be this arrogance and smugness that emanates throughout. Why no posts in here supporting my opinion? Who would argue with this kind of resistance. I know I’m through.

I’m finished with the model editor now anyways, and not a moment too soon.

I think it’s more about design ideology.

If you want a pointer to the frame buffer, fine. I’m sure there is a way to get it. There may even be libraries to help you do so. But, by the design ideology of OpenGL, it should not. By doing so, it kills any platform/hardware/implementation independence that OpenGL provides.

I’m not arguing whether it is right or wrong for you to want or have that pointer. I’m just saying that, by the design ideology of OpenGL, it has no buisness providing one. OpenGL is supposed to hide these things.

Wanting an extension for OpenGL that provides direct video memory access is like trying to use a hammer to screw in a screw. Screws and nails are similar, but a hammer makes a poor screwdriver.

If you really want a pointer to the framebuffer, you can use another API. Since you are stuck with the software implementation of OpenGL, you can aswell use another software API that is spcifically designed for software rendering. OpenGL is known to be a slow software renderer because it’s suppose to give you visual quality over performance.

Second, yes this is a forum where you can post new ideas. But we argue because we can’t just toss just about anything into OpenGL. Like in this discussion, you post the idea of a way to retrieve a pointer to the framebuffer. We disagree because we don’t think it would fit into what OpenGL is supposed to be.

Sheepie, I’ll refrain from arguing the more meaningless points (such as the thought put into low level optimisation), and just try to explain again the speed benefit of not writing directly to the frame buffer. You seem to have not understood the idea of doing things in parallel, so I’ll give you an example.

Suppose you’re using your extremely efficient code that “does the calculations at the same time as storing it to video ram”. Suppose it takes your code 20ms to do these things. Now you want to draw some 3D shapes on that background. You make some 3D call, and the driver digests your data and passes it to the card to draw, and the card then takes 30ms to draw them. Total: 50ms (ignoring the driver work).

Now suppose that you do it another way. You make your OpenGL calls to draw the data. The card now starts its 30ms of work. While it does that, you draw your image into a buffer, and call the driver, which copies it into AGP memory. All this is done within those 30ms when the card is working. Now the card still has to read your texture and draw it. Even if it did read all your data, then it would be faster than your direct writing to vidmem, since it will use AGP bursts. But it doesn’t even need to read all your data - if you’re using the image as a background, and there are already shapes drawn (which was done in the 30ms of work), AGP texturing can just read the data it needs to fill the places where the background will show. So this work will take a lot less than the 20ms it took you before, and you end up with a speed gain.

A note about the attitudes in this forum: I believe that the reason that nobody supports your opinion is because nobody agrees with it. This specific forum is not a support forum (unlike the others here at OpenGL.org). People mostly get bashed in this forum (including me) for a reason - it’s your job to make a convincing argument that a feature should be part of OpenGL. If you can show how the feature you’re suggesting will be useful to a lot of people (give examples of real applications), and address the concerns raised (for example the inefficiency, as I’ve explained in the previous paragraphs, your inability to address memory layouts that you haven’t encountered before, and other things), then you may be able to convince people. About the smugness, well, programmers are typically both arrogant and helpful.You’re extremely arrogant.

First off, I didn’t come in here bashing your examples. I provided examples. What would you have me do? Just say I want a pointer and not give examples?

I read every word of every reply in this topic and I understand your point. You however do not stop making it. I can not use a texture. What would you have me do with a 256x256 texture to get it to full screen and still maintain the quality? The only way would be to use seperate textures and tile the screen, which of course means determining the screen resolution and then breaking that down into power of 2 textures, and breaking down the algorithm into sections(one for each tile). Sounds like a lot of effort and a lot of code and a lot of speed lost.

You continue to maintain that the stall would be bad enough that using textures would be far faster. The example program that I mentioned has very high CPU/Bus utilization. How would this effect the transfers that would inevitably need to be made to the graphics card? For a simple scene, it may not be an issue, but when the graphics card starts using system memory, then there will be timesharing.

Nobody supports my opinion? A simple search for glDrawPixels yielded numerous posts, and I myself have seen many posts asking why glDrawPixels was so slow, if there was a way around it, and invariably the answer always came back to textures. When pursued that a full screen would require more than a single texture, the argument was towards a higher end card(something that would utilize glDrawPixels better). But hey, it sounds good to say things that promote your side of the argument even if they are incorrect.

Arrogance is defined as someone who makes unwarrantable claims to superior performance or rights. My claims are warranted as I have actually done the examples I listed. I do not claim that you or anyone else needs to use this pointer, merely that one could be used, and I give reasons why. The people who have posted here have made it quite clear that they have made their minds up, and have went so far as to indicate that I am some dumbfounded person who needs pointers for of all things…a printer. You(ET3D) have even referred to Assembly optimization as thoughtless. But I expected the reaction as my post was heavily worded. I have no personal animosity towards any individuals on this board, but I think having an atmosphere where people are immediately put into a defensive posture will only serve to decrease the attitude of teamwork.

It is my responsibility to promote the idea of a pointer. I have given examples of where it could be a benefit and have given the reasons why it would not be effective using textures. I don’t really expect it will be implemented as the vicious circle is already well underway. Nobody uses glDrawPixels because it is too slow, and nobody optimizes glDrawPixels because nobody uses it. At least, that’s my opinion, I’ll leave you to yours.

Okay, Sheepie, sorry. I apologize about the “thoughtless” thing. I just retorted to the generalisation that programmers who put importance in other things are lazy (and that’s arrogance, BTW). I’m really not here to attack you personally, but to try to understand your point of view, and make you understand mine.

First of all, most chips, and I believe that the software implementation also, support textures of at least 1024x1024 (correct me if I’m wrong - you can query that). The main exceptions are the Voodoo family (3 and lower). Which is indeed important enough, but for our purposes would not matter.

As I mentioned before, even if your pointer feature does get adopted, it will only be implemented in newer chips, and most likely won’t be implemented for the Voodoo family (since 3dfx is dead). So you’ll still have to use other means for these card. Same goes for other limited cards. You will pretty much be able to take for granted that you have a powerful card.

I can’t give you a good answer about how a program with high CPU/bus utilization would affect performance, but that’s something that you may want to check. It highly depends on the scene and what you’re doing. My guess would be that parallelism would still work (even though your guess may be the opposite). In any case, you at least acknowledge that in some cases this method will be faster.

BTW, I was a little surprised to read that your program has a high bus utilisation. But I guess that you’re doing enough pre-fetching to offset the high latency of reading from main memory.

Yes, people do find DrawBuffer and ReadBuffer slow, but it’s possible to want them faster, and still be against a pointer. If you read back my posts, I always say “yes, a faster way to read/write will be better, but I don’t think a pointer is good”. So I’m not arguing. Matt did post what formats are best to read from GeForce hardware, and NVIDIA docs say what texture formats are fastest to transfer. This in not a complete solution, but it’s better than nothing. Other people suggested ideas other than a pointer to the frame buffer. A frame buffer pointer is not the only way.

About examples, the only one you gave, AFAICR, is your screen saver. That’s not enough of a good argument for me. I just don’t have high regard for screen savers. Do you have other examples where your idea may be helpful?

BTW, you’ll probably love programming a GeForce3. That should put your talent to good use optimising vertex programs and pixel shaders.

Dang, lost my post.

Ok, let’s try this again.

As far as I can see, the reason Sheepie wants a pointer to the framebuffer is because glDrawPixels is considered too slow for what he wants to do, and he doesn’t want to go to the trouble of using textures and/or they would be slow on his software implementation.

So, why not a compromise? Add a function to OpenGL called glFastDrawPixels or something like that. The user would specify the pixel location of the rectangle to be drawn, the height and width of the data, a data format, and a pointer to the data. Basically a 2D blit. This should be faster and easier to optimize than DrawPixels because there would be no transformation, fogging, texturing, scaling, or RGBA color manipulation, all of which can happen under DrawPixels.

And if this isn’t fast enough, there could be an extension similar to VAR whereby the pixel data can be put into AGP or video memory and the card could pull it through DMA.

In Sheepie’s case, he could put the results of his calculation straight into an AGP buffer, and the video card could pull it from there. Or if he still wanted to, he could copy it straight into video memory and have only one transfer of data, which is what he wanted. Although I have a feeling the AGP would be faster.

The advantages of this as far as I can see are:

  1. Abstraction. You wouldn’t need to know anything specific about the card; you just tell it what you want it to do instead of doing its work for it.

  2. Speed. In some cases, this may be faster than a pointer to the framebuffer. By having the video card DMA the data it needs, the CPU can work on something else at the same time. Not to mention, writes to AGP are usually faster than writes to video memory.

Disadvantages:

  1. Speed. In the case of a software renderer where the pixel format of the framebuffer is known, a pointer would be faster.

Well, any comments?

j

Yes, I agree. That’s what I wanted to suggest initially, then reconsidered. I still have some reservations, but, reading again the description of glDrawPixels, I think that it may be worth having glFastDrawPixels (more correctly, having glDrawPixels, while the original should be called glDrawFragments).

My reservations are:

  1. It’s not necessarily faster than an optimised glDrawPixels. It’s not just a ‘2D blit’. The frame buffer can always be a different format than the image you’re drawing, and conversion will be needed. You can disable all fragment operations before glDrawPixels, including depth buffering, and an optimised code path would probably work just as well as glFastDrawPixels.

  2. It’s difficult to define this well. The simplest definition would be a simple copy to the frame buffer, which is fine in Sheepie’s case. But what if you want to have your image in the foreground? You’ll have to define this somehow, which means that you’ll need to either define a depth or use alpha testing. You’d probably want a depth value in any case, so that your image has some relation to the rest of the render. Which kind of brings you back to being like glDrawPixels.

Still, a “fast” form of glDrawPixels has an advantage not so much in speed but in simplicity, since I’d imagine that most people don’t need all the extra features that glDrawPixels provide. glFastDrawPixels will save the need to turn off all the fragment operations, then back on. There will also be some speed benefit in that, in saving these state changes. The definition problem is not a serious one. I imagine that implementations will not write to the display directly, but use the image as a texture (direct from AGP, probably) for a screen size quad, thereby adding the Z and alpha testing. Still, I’m sure that people will want alpha blending, too, so it may be a little difficult to make a clear cut here.

Automatically disabling fragment ops really does no good. You can just disable them yourself and we can notice that. In fact, if you want optimum DrawPixels DEPTH_COMPONENT performance, for example, you need to set up the proper fragment ops. Specifically, you need depth test on, depth func ALWAYS, and color writes off (i.e. ColorMask FALSE/FALSE/FALSE/FALSE, or DrawBuffer NONE). In short, you need to set things up so that you are really just writing depth into the depth buffer and doing nothing else in the operation. You also need to pass in the right formats. I might as well fill the depth component DrawPixels formats for best efficiency:

16-bit Z: UNSIGNED_SHORT/DEPTH_COMPONENT. (Requires fragment op setup.)
32-bit Z and stencil: UNSIGNED_INT_24_8_NV/DEPTH_STENCIL_NV. (Note that you don’t need to set up the fragment ops for this one. Read the spec [which I wrote, BTW].)
32-bit Z only: UNSIGNED_INT/DEPTH_COMPONENT. (Requires fragment op setup.)
32-bit stencil only: UNSIGNED_INT/STENCIL_INDEX. UNSIGNED_BYTE/STENCIL_INDEX isn’t bad, either. (Again, no fragment op setup required. Stencil DrawPixels works differently. Read the OpenGL spec.)

As for DMA, yes, that would help. I have nothing to say on that front at the present time.

  • Matt

Thanks, Matt! This thread is accumulating some helpful info from you. Any chance this will be in the next revision of the GeForce performance FAQ? BTW, are these tips also true for the TNT family?

I hope you won’t mind sharing a little more. Let’s go back to Sheepie’s problem. Suppose he wants to use a certain image as a background to some 3D rendering. I’d assume that the following would work quickly, but I’d like you to nudge me in the right direction if I’m wrong:

Clear Z
Disable Z, texturing, blending, … All except ColorMask(true,true,true,true)
Use DrawPixels with BYTE/BGRA or 5_6_5/RGB depending on 32 bit or 16 bit mode

Many of these things will not be true for TNT; not all of them are accelerated.

I would suggest sticking to GL_BGRA/GL_UNSIGNED_BYTE for DrawPixels whether you are in 16-bit or 32-bit, for the moment. In the future more options may be available. (Other formats will still certainly work and run “reasonably” quickly, but they will not be optimal in performance.)

  • Matt

Well, I apologize for being rather inflammatory as well. I tend to feel very passionate about issues and do take things personally at times(something I need to work on). I just didn’t see what people were saying as being helpful but rather being negative.

I’d be more than willing to give up the pointer idea if there was some chance that a Blt(glDrawFastPixels) function might appear on the scene as j suggested. I really don’t understand why the current glDrawPixels is so darn slow except for the reasons I mentioned above.

The reason I use the screen saver as my example is that it uses fairly high resources and I am considering adding 3D to the next release. This has unfortunately thrown me into the problem of getting that data to the screen fast enough. I also work on other projects(model editor, and an upcoming game) that could use this feature but as the model editor doesn’t require 2D and the game is in the drawing board stages, I thought those would make poor examples.

Thank you Matt for providing some information on making glDrawPixels faster. I try to make the formats as similar as possible.

Maybe if someone could provide a Fast Draw Pixels(comparable to a BitBlt), that would be a great start. Maybe that could be enhanced later to providing some degree of masking. Many effects could be achieved if the BitBlt could be used for reading and writing.

I do indeed realize that all of the wonderful arguments the 3D guys were making are mostly truthful, but a pointer to the frame buffer is the only way to maintain proper bus speed for high speed data transfers. If for instance you were Moving 240 mega-bytes per second across the PCI bus you can direct target an AGP buffer (GL??) and use a max of 240 megabytes per sec. If on the other hand you were to 1st hit a target in memory then let GL move the buffer onto the card you are now making your memory do 480 megabytes per sec worth of work (240 on the pci side and 240 on the agp size). Which means that should you want to do anything fun with the video data you have less time to do it and you are really pushing your ram speeds which slows the rest of the systems down. I currently have applications that require unhindered pipes of around 400 megs per sec to the AGP card something i would loath to do thru system memeory.

To make a long story short High End realtime 3D systems and HD / Film Res video systems are very difficult to make without direct targeting (ie going from pci to agp without a hop thru memory). This has so far kept many companies like mine working with proprietary API’s which give us speed with out all the robustness of GL.

I would be very willing to do much of the work myself to the API and even show how to get direct buffer access (if needed i am an engineer who has done this) if it would allow me to use GL.

Matt hinted that it may be possible to use an AGP buffer in the future.

Your argument would only hold if information was flowing one way, ie. you were doing something like:

for(each line)
for(each pixel in each line)
*(ptr++)=generator();

where generator() is a self contained function (ie. doesn’t reference any external data).

as soon as you start relying on other data (a convolution, for example, that sucks up an array and turns neighbouring values into a new value) then your argument begins to lose credibility. If you did this, for example:

for(each line)
for(each pixel in each line)
*(ptr++)++;

then you’re reading from a slow resource.

So, your argument will only hold if the cpu can generate cool stuff without state. How useful is this? Furthermope, the above example can’t use burst writes, since you’re not providing a pointer to a buffer and getting DMA to suck it straight through.

In summary:

your argument will only hold so long as

  • your code doesn’t use state from its previous calculations
  • writing single values at a time is not slower than block writes.

if these don’t hold, then it won’t take MUCH computation when reading/writing to a faster resource and then DMA to the card won’t start winning.

IMHO, of course.

cheers,
John

Heh heh. I came to this forum in order to post a suggestion of having a pointer to a frame buffer, and it looks like someone else beat me to it. =)

Now, the reason I want one is this:

I create a 640*100 interface for a game. I want to put it on the screen just like it is, no lighting, no fog, no multiple poly’s, no scaling. If I had a pointer to a frame buffer, I could just blit it into the buffer after all my 3d draws were done, and life is good. It would look EXACTLY as it did in the paint program.

What I am using now is Ortho view which is essentially overkill, because I also have to disable effects and all that JUST to get it on the screen the way it looked in my paint program. I also have to split the graphic up into smaller textures just to get a non standard size graphic of 640100. That is a pain in the butt, IMHO. Yes, getting a 640100 graphic can be done in OpenGL, if you want to jump through a bunch of hoops.

I am not suggesting a pointer to the video card memory (although it might be nice to have around anyways). I am suggesting a pointer to a ram buffer. That is if OpenGL draws to a ram buffer, then blits it to the hardware. In that case, a pointer to that video buffer would be a very nice, and easy to implement, feature to us developers to make our lives a little bit easier.

Now, since I am not entirely sure how OpenGL draws to the hardware, if a ram buffer is not used in any way, then having a pointer to the video ram would still be a welcome change. It may be slow to access, but heck, if a developer is willing to use it at the risk of speed in his application, then let him. =) Nothing wrong with that.

Anyways, just my 2 cents or more like 5 bux.

OpenGLRox, I would say that this is a good example of when you don’t want to draw the image as is. Drawing that 640x100 image as is means that you’re limiting the player to a specific resolution (presumably 640x480). This is considered a bit low today even for pseudo-3D games. If you draw using textures, you can scale the interface to fit higher resolutions (even if it won’t look exactly like in the paint program).

If that’s what you want to do, you should either use DrawPixels or a texture. (A texture has big advantages, as noted, since it can scale w/ nice filtering.)

  • Matt

ET3D, the 640100 was simply an example of a non-standard size piece of art that someone might want to draw. I’ll change it to 1280100 then. =)

The reason I might want to draw a graphic at actual size it I may not want the art to get lossy from scaling down. I may not want the art to get pixelated from scaling up. I may not want lights on it, or fog. I just want it on the screen exactly as it is.

The point is, limiting a developer in any way is a bad thing. Take a look at Internet Explorer vs Netscape. Have you ever tried to design a nice looking web page with lots of goodies? In IE, it’s easy because they give developers a lot of stuff they can play with. In Netscape however, they limit the developer to the w3 statndard and refuse to budge. So, when I have done web pages for people, I have had to say, “Well, if you want a really nice web page, IE people are going to see it. We MAY be able to do a FEW nice things in Netscape, but we have a bunch of hoops to jump through, and more code to write to make it happen”. 1 person asked me to put a margue on their web page. No prob in IE: <marque>some text</marque>

In Netscape though, get out the java script manual. There are hoops to jump through. As a result, I try to steer people away from Netscape now, which isn’t hard considering you can have some nice activex stuff in IE.

I think OpenGL is awesome, but tying a developers hands in any way is not a good thing, and being the parent by saying, “We’re not going to let you have that. You might hurt yourself”, is also most likely not going to make Joe Developer who has been coding for 30 years very happy.