Seems really cool. Now that the NDA is lifted, does anyone who might have one of these things already know whether the glReadPixels performance is improved?


looks like great work done this time by nvidia, congrats in advantage. now their web departement is to blame. i can not download anything interesting from nzone at all… :frowning:

won, why is that so important? they have pixel buffers for async readback, but more important, you should let the data stay on the card if youre aim for speed…

Mazy –

What you say is true and obvious, and in a perfect world I would do just that. However, sometimes you need to get data off the video card, and for my particular application I need to do it quickly. Suffice it to say, I don’t intend to use GPUs in only traditional ways. Anyway, this feature is important to many other people besides me.

3Dlabs cards, while they don’t support the async stuff (yet), do support fast AGP transfers in both directions. On a Wildcat VP, a glReadPixels call can get 700MB/sec over AGP 4x, which is reasonably close to the AGP 4x limit of roughly 1GB/sec. On a Geforce, a glReadPixels call can get 240MB/sec, which is reasonbly close to the 64-bit PCI limit of 266MB/sec. There’s an order of magnitude (or more) that I’d like to get back.


Jep, this chip is really cool! It has some very nice features. I hope the chips with 8 or 12 pipes will be available soon too!

The problem is that, for really valid numbers on readback improvements, Intel needs to lift their NDA as well (on PCI Express chip sets).

Even if you get the PCI Express version of this card initially NVIDIA will have an AGP->PCI Express bus bridge on this thing, so it’s not going to be native PCI Express out the gate although TH says it is “AGP 16x” on the card(?). Either way for readback it’ll only come into it’s own with PCI Express, although it’s anyone’s guess what the readback implementation will perform like right now over the bridge. Reads vs writes are still going to be hopelessly asymmetric on bandwidth until you eliminate the AGP bus.

Having said that, the performance is stunning. They’ve aimed very high with this thing. It’s big and power hungry but that’s intentional, they need it to win. Once again I’m amazed by what I can buy for $500 at Besy Buy.

This doesn’t mean that their AGP implementation can’t have fast AGP transfers in both directions; it only means they can’t occur at the same exact time.

I’m guessing the PCX bridge is going to have a fairly minimal performance impact. There might be issues with cost/heat/reliability, but unless you’re doing small, frequent bidirectional transfers, the internal AGP “16x” can probably handle the bandwidth. Since the PCX bridge is essentially soldered directly to the GPU, there’s very little line capactance or whatever electronic limitations that are typical from, among other things, going across a edge connector.


One of the guys down at Beyond3d.com calculated that clock for clock, pipe for pipe the NV40’s pixel shading power is 1.4x that of the R300. I’m not entirely sure how accurate that is, but it’s certain that NVidia has come out with a chip that is fundamentally more powerful and not just bigger.

Whoops. I knew I should’ve been more specific with the thread title. I was specifically asking about glReadPixels performance.

Having said that:

Clock for clock comparisons are basically meaningless. I remember Anandtech did a clk/clk comparison of various CPUs like the Pentium, Pentium MMX, Pentium Pro, Cyrix 686, AMD K5/6 or whatever the contemporaries were. It found that the Cyrix chip was fastest clock per clock. Who cares? It could only go a fraction of the clock speed of other microarchitectures, and clock speed scalability matters, too. It isn’t an indepedent concern.


Won, PCIX isn’t PCI Express. PCI-X is a different standard. The bus bridge is a chip, the point there is a protocol translation going on as well as GART aperture remapping etc, at the bare minimum the limitations of the original bus apply.

Ostol, I know it’s not just bigger, my point was more about their commitment to winning, they’ve gone about as big and hot as they could bare to without getting crazy (or maybe it’s borderline crazy). It’s a monster, but that’s a good thing IMHO.

Originally posted by Won:
Seems really cool. Now that the NDA is lifted, does anyone who might have one of these things already know whether the glReadPixels performance is improved?

Try to use PDR or PBO. It should be fast enough…


Some of the new features of the card look REALLY cool, like the Vertex Frequency Stream Divider. Wonder if it will be exposed in openGL first, or DX?

Pretty soon, you will need a seperate PSU to drive these things. (Not that this is a bad thing mind you.)

Originally posted by yooyo:

Try to use PDR or PBO. It should be fast enough…

PDR does little to improve raw performance. We need to an order of magnitude improvement to start doing some really interesting stuff with readpixels.

Dorbie –

You misunderstand.

PCX is the name of the bridge (should’ve mentioned that). AGP has no INHERENT limitation on read bandwidth; it is simply not implemented on NVIDIA GPUs that I’m familiar with. 3Dlabs, for example, currently has fast AGP in both directions. Details in a previous post.


I can only get 60 MB/s on my Radeon 9200, so consider yourself lucky :slight_smile:

Re the NV40, I’m amazed. Looks like a great engineering achievement, flawless, save for the R300ishness in the anisotropic texture filter department. And whether or not that’s a flaw is highly debatable, I suppose.

To spell it out further:

The PCX bridge is probably going to be a very efficient in translating AGP commands to and from PCI-Express commands, and it will do so in both directions. The reason why I believe this is because NVIDIA also plans on using PCX to bridge their PCI-Express native chipsets to AGP. This means that AGP transfers must go fast in both directions, otherwise when they flip it, you’re going to have slow texture uploads. Same with the PCI-Express side.

If this is true, that means fast ReadPixels depends only on the GPU/driver, not the bridge.


Won, thanks for the explanation, I misread your PCX statement, sorry. I’m reluctant to infer too much from it’s intended use in another context about it’s performance in this one, but you make good points and I’ve learned something new.

dorbie: “Once again I’m amazed by what I can buy for $500 at Besy Buy.”

Wow. Your Besy Buy must be much better than the Best Buy I go to in San Carlos. Here they don’t have them yet, and won’t for at least one, if not two more months. Where’s yours? :slight_smile:

I can’t believe that, with all the power and functionality that NV40 promises, the best thing you can think of to discuss is glReadPixels performance.

It’s a graphics chip. Draw some pictures with it.

Re the NV40, I’m amazed. Looks like a great engineering achievement, flawless
Flawless? We don’t know that yet. I still want to know actual performance characteristic in the fragment program with both looping and 32-bit floats. As well as just how fast/slow “texture” accesses in the vertex shader are.