GL 3 & D3D: The War Is Over

Seriously now, there are a bunch of custom non-polygon 3D pipelines being built on current GPUs, many of which are real-time. Here are just a few quick examples (I’d post more, but I’m posting while I’m on vacation in Yellowstone),

Raycasting for medical imaging: http://medvis.vrvis.at/287.html

Sparse voxel octtree, demoed by Jon Olick: http://s08.idav.ucdavis.edu/olick-current-and-next-generation-parallelism-in-games.pdf

Hierarchical point cloud based with framebuffer recirculation, GPU side tree structure, GPU occlusion, “bone” per parent node, etc … one of my current projects: http://www.farrarfocus.com/atom/080709.htm

Hierarchical imposter based rendering, also by me:
http://www.farrarfocus.com/atom/070817.htm

Also the OTOY stuff, which I don’t have an easy link for.

Etc, etc…

I’m not yet convinced that Larrabee is going to be any faster for non-triangle based rendering of quality comp to current rasterization. For one, even on Larrabee, I would almost never use the coherent caching or fine grain synchronization. When you work in object counts in the hundreds of millions (ie verts, triangles, points, voxels, fragments, search steps, etc) you must be very minimal in what is done…

Besides this entire rasterization is going to die thing is silly. I think there is a clear path to hardware Reyes now, and probably DX and GL go that way within DX12 time frame. If anyone has any algorithm which can top rasterization for the same given quality level and lower bandwidth usage for a fully parallelized version, I’m all ears…

I second Rob’s thoughts on Larrabee. I’ve had enough disgust making 3D software rasterizers on such gimped-down cpu cores. (asm code everywhere, tuned to 95% of theoretical performance T_T…) . Only the large amount of on-board cache and dedicated instructions for texture-filtering are of any help. And Intel have never been bright about software developers’ needs for code-optimization. SSE1-4, anyone?

Anyway, let’s say that the claim that GearsOfWar runs decently on a 25-core Larrabee is true. While the high-end card will have 32 cores. Can you bet that that high-end card will cost $80 - as much as a gpu that plays the same game with same performance? Hint: Intel’s high-end products always cost an arm and a leg. So, they’ll try to make some nice faking (as in non-interactive, useless for gaming) tech demos to lure the users. This is my guess.

Rumour is that Intel is trying to do exactly that. They’ve been in talks with Microsoft to build a 48 core Larrabee processor in the next XBox.

Yes, that’s exactly what console development needs: to become more expensive…

sigh I was hoping Microsoft actually learned the lessons Nintendo is teaching them.

To be clear, I simply pointed out that the data presented so far are based on a certain combination of variables (core count and clock speed). I am not even offering an opinion as to whether the final product might fall short of those numbers, or go well beyond them.

Reading it again I can see how what I wrote could be interpreted as either positive or negative spin - it was honestly meant to be spin free because I have no data to make an assessment on.

more simply
if (high clock rate) good_news(); else bad_news(); …

I don’t see why we can’t have both.
Tim Sweeney is correct in a way, as gpus become ever more generalized there will come a day where you might just as well be doing rendering trough openCL or cuda, but i don’t see why we can’t layer opengl on it, or DX for that matter.

This is not the end for openGL, that article is deeply flawed and biased, just because DX does some retooling doesn’t mean it “wins”, in fact it only extends it’s life, because if they didn’t, they would surly loose.
So openGL is not down for the count, and with a little bit of API refactoring it could become one of the better graphics tools out there, for now.

Do these now and openGL could be #1 again

  • Fix the texturing model, introduce texture shaders.[]Add a programmable pipeline, even if not currently possible.[]Add the blend shader, any day now, we are waiting for it, try adding it before Microsoft does.[]continue making FBO more flexible.[]“Object” shader, it’s the shader equivalent of display lists.

Do these now and openGL could be #1 again

So, your solutions is basically “Make up a bunch of features that IHVs won’t implement until GPUs become many-core CPUs?”

Features are only useful if they are usable. Features not implemented in hardware are not usable, and therefore not useful. And increasing driver complexity like that does nothing for OpenGL’s big problem with implementation quality.

3DLabs tried this, like 5 years ago, and they failed. NVIDIA and AMD(?) and some ISV (Blizzard, id) tried this recently also, and they failed, again.

Once, it is a coincidence. Twice, it is a statistic. Thrice, it is a tradition.

[quote=“zeoverlord”]

Do these now and openGL could be #1 again

  • [li]Fix the texturing model, introduce texture shaders.[]Add a programmable pipeline, even if not currently possible.[]Add the blend shader, any day now, we are waiting for it, try adding it before Microsoft does.[]continue making FBO more flexible.[]“Object” shader, it’s the shader equivalent of display lists.

Well, we would have to see what hardware would look like in the future. We already have a pretty good idea what DX11 hw is going to look like, and one the last item from that list is going to be there in some variant. So now would be the time to make the necessary changes to GL to accomodate those DX11 hw features.

  1. You have a better idea?
  2. tell me one that absolutely can’t be done in some way on the current or next gen gpu architecture.

The thing is that at the current trajectory we won’t have “hardware” features anymore (save for some more optimal texture fetch), opengl will be more of a software layer than a hardware acceleration API, software that could be run on basically any hardware, be it cpu, gpu or whatever comes in the future.

I mean wasn’t this the original purpose of openGL 3.0, to remake it for the future to make it fit the current hardware layout, but in 5 years that would still be obsolete, any API any software in fact continually has to grow, change and adapt or be left out in the cold.
I am just suggesting that it’s better to get ahead of the curve a bit, especially with the long turnaround times the ARB has.

No we already have a pretty good idea of what DX11 software will look like, the hardware is another issue, we know what it must be able to do at a minimum, but how, that’s another issue.
I haven’t looked at the DX11 features yet but i seem to recall that it would be almost like DX 10.1 and factoring in the current trends i have seen, most of that will come trough relaxing some hardware restrictions and generalizing hardware.

  1. tell me one that absolutely can’t be done in some way on the current or next gen gpu architecture.

Define “next-gen”. If you mean “DX11 hardware” then “the shader equivalent of display lists” is certainly not going to happen (DX11 will have tessellation shaders. That’s far from display lists). Neither are blend shaders.

The thing is that at the current trajectory we won’t have “hardware” features anymore (save for some more optimal texture fetch), opengl will be more of a software layer than a hardware acceleration API, software that could be run on basically any hardware, be it cpu, gpu or whatever comes in the future.

OpenGL itself will certainly not become a combined CPU/GPU abstraction layer. That’s what OpenCL is for. To use hardware like Larrabee, OpenGL will simply implement its well-defined rendering model in optimized code written by Intel or whomever.

The programming interface that Larrabee provides will not be OpenGL. OpenGL is and always will be a graphics library.

  1. Do you really think nvidia (or ati for that matter) is going to bother making a d3d specific tessellation hardware.
    Do you really think they are going to waste silicon on something that might not be used that much in the beginning, rather than to add more cores.

  2. In the spec for glsl there is some mention for blend shader like functionality (they rejected it because of some perceived performance issues for some unmentioned IHV), at that time the best hardware was the GF6xxx (or could even have been even the FX series), hardware has evolved a little bit more than that today.

  3. Why is is that DX get to add “hardware” features and not openGL?

  4. I never said it will become a CPU/GPU abstraction layer, rather eventually run on one, but to an extent it already does, ever heard of mesaGL.

Why bother aiming for the skies right now, when the window is tightly shut? ATi have 35% of the market, but do not support much in OpenGL. Geez, it’s too hard to find a couple of programmers to put on that project.
Where’s transform-feedback, geom-shaders (and their depth-cubes), some ati-specific <s>shader language</s> intermediate asm language, produced with a compiler like cgc (so that we can do shader-caching ourselves as it’ll never come from ARB), etc etc.
There are a billion of people staying on WinXP, and there DX9’s driver-model is plain sh*t for performance.

A feature to be added should be immensely useful, to get its own rendering-path in games.

  1. Why is is that DX get to add “hardware” features and not openGL?

Once upon a time OpenGL was at the forefront of hw technology, thanks in part to the extension mechanism and Nvidia’s continued innovation. Heck, if you’re using NV hardware, you enjoy most if not all of the latest hw features, nonstandard though they may be. Standardization timeliness, or lack thereof, is part and parcel of the cross-platform trade-off, IMHO.

ATI hardware has a subset of DX11 tessellation currently in hardware and really not being used.

I’d bet that NVidia also provides hardware dedicated to tessellation in DX11 time frame. Tessellation somewhat leads developers towards something more like REYES, assuming the hardware parallelizes triangle setup. So it might not be a bad thing for graphics. I’m not sure yet.

Do you really think nvidia (or ati for that matter) is going to bother making a d3d specific tessellation hardware.
Do you really think they are going to waste silicon on something that might not be used that much in the beginning, rather than to add more cores.

“Adding more cores” won’t get them the performance needed for tessellation. Maybe they’ll implement it using straight software, or maybe it’ll be a partial software/specialized hardware thing. Who knows? DX is giving the IHVs the freedom to decide how to go.

In the spec for glsl there is some mention for blend shader like functionality (they rejected it because of some perceived performance issues for some unmentioned IHV), at that time the best hardware was the GF6xxx (or could even have been even the FX series), hardware has evolved a little bit more than that today.

Yeah, and they still don’t exist. As in, there is no hardware that can do that. Nor is there any in the near future.

Why is is that DX get to add “hardware” features and not openGL?

Because they make API releases in a timely manor? Because they release an API that is widely used? Because they get with the IHVs and find out what the hardware is going to be able to do in the near future and thus implement it in their API? Because their position as market leader allows them to dictate to an extent what IHVs will put into their hardware. Possibly all of the above.

OpenGL is not in a position to do these things. It probably never will be.

Programmable blenders have been difficult in the past in many architectures because GPUs have very deep pipelines, and one natural design is to tie blenders closely to the memory interface.

People have also asked to be able to perform raster operations in the shader program, but this is also tricky to do at full speed 1) with overlapping in-flight fragments and 2) multisample antialiasing.

These aren’t insurmountable problems, but they’re difficult at least, and for unclear benefit.

Tilers do offer a much more naturally extensible back end because they put the pixels in the cache, right next to the shader. Very low latency, very high bandwidth access to pixels has some really attractive properties.

Tilers have traditionally been unattractive because they place a heavier burden on the host CPU to bin and replay the command stream.

The interesting reason that tilers may be on the comeback is that a) a many-core GPU can do its own binning and b) a smart GPU can traverse a spatial data structure directly rather than making the CPU serialize what the GPU then needs to de-serialize.

If tilers are successful (which seems likely for any software-centric renderer), they will ease many aspects of programmability in the pipeline - like a reconfigurable pipeline, tessellation support, programmable blenders, alternative renderers like REYES or image order. The one exception for the near term is probably texture fetch, though I could imagine it becoming a separate (but distinct) coprocessor before too long.

The funny thing to me about all this is that graphics seems to be in the back seat on the future of GPUs. The graphics APIs have become relatively staid, and we’re all looking with enthusiasm at what the compute APIs might do for us.

If tilers are successful

Just because it’s not on the desktop running Crysis at 60fps doesn’t mean it’s not successful :wink:

In desktop discrete and high-end consoles (users that care about 3D graphics at all), I think you can argue pretty convincingly that tilers have not been successful to date. In the low end and embedded markets, the value proposition is a lot more favorable to them than in desktop discrete.

Traditionally, anyway. Like I said though, all the current winds of change are much more favorable to a tiler than they have been in the past. Reconfigurable pipelines, programmable blending, hw accelerated non-traditional renderers, gpgpu,… these make a many-autonomous-core-with-big-cache device attractive. And tiling is a no-brainer decision on such an architecture.

Dominance can be self-affirming. Immediate mode renderers were able to outpace tilers in the past due to different market and technology conditions. As a result, applications were written in such a way that tilers had difficulty with them.

The current innovation path with immediate mode renderers has pretty much stalled at DX9 capabilities, and the architectures themselves are not driving fundamental changes to the APIs. Changes in the technology landscape that Moore’s law dictates are erasing many of the traditional benefits of an immediate mode pipeline tuned to maximize memory bandwidth efficiency and hide huge latencies.

The companies that recognize this trend are the ones that will be dominant in graphics 5 years from now. It may be the same companies that are there today, but it may not be.

Yeah, but the long term future of many-core might require a fundamental change in order to insure good scalability. Core to core communication and latency easily becomes a bottleneck, where a single monolithic coherent cache simply doesn’t scale. We would effectively get a bunch of cores with separate memory which we would need to stream (ie slow like texture streaming) assets to (based on resources required by the core for display traversal), and application which broadcasts a relatively small per frame data stream to each core (ie positions of camera, objects, etc). Then cores which do their own display traversal and update their tiles of the screen. Tiles get send to something which builds final output for display (or displays). Perhaps even overlapping tiles to possibly blend out artifacts of asset streaming being “just to late”. Effectively the distributed raytracing model.

We are bound to end up in the extremely non-uniform memory access model (ie effectively a network between cores) as computation continues to scale. This is already starting to happen, for example AMD sticking to dual chip for the highend with only 5 GiB/s per direction of cross communication and another 5 GiB/s per direction core communication with host. Seems to me that there is going to be a limit in the not too distant future that NVidia will hit which makes a single monolithic chip not cost effective.

As for the short term. I don’t see current GPUs being all that different from tilers (in fact with a pre-z pass they become somewhat like differed tile based renderers). Just that we have a serialized setup and a semi-parallel binning happening in hardware. Where the 2x2 fragment quads get binning in a fixed distribution across cores. Effectively each independent core gets a sequence of smaller tiles to process. The output merger (a cache) works in small tiles, accumulating 2x2 fragment quads into tiles and then doing per tile global memory transactions. Seems to me that the primary difference in Larrabee to GPU is tile size/granularity, details on swapping tiles in and out of cache, and binning. GPU might also have an advantage here in framebuffer output latency in that they can begin processing right away in draw order and swap more tiles in and out of cache.

As for programmable blending and the near future of GPU hardware, if the CUDA PTX docs are any sign of NVidia’s future, there is the .surf memory space which is accessible via surface instructions with R/W access and per context shareable. Just the .surf memory space hasn’t been implemented yet. If this is in designs for future hardware, then we get a high latency (with respect to Larrabee) coherent cache on NVidia hardware as well, perhaps with free type conversion and blending.

IMO, in 2009 I think we get an early view of how everything will unfold, if we still have NVidia’s single chip monsters and triangle setup is parallelized in hardware (ie scaling a little better with bandwidth and computation), then our current GL/DX graphics API should remain quite useful for a while…

IMO, in 2009 I think we get an early view of how everything will unfold, if we still have NVidia’s single chip monsters and triangle setup is parallelized in hardware (ie scaling a little better with bandwidth and computation), then our current GL/DX graphics API should remain quite useful for a while…

To me, the real test is this: is Larrabee any good? Despite its flexibility and so forth, if it can’t produce performance equal to nVidia or ATi hardware without forcing you to writing your own renderer or conform to a totally unique memory representation of your data, then it’s just not going to fly. The standard model of high-performance graphics will last longer if Larrabee is essentially a failure.

The other thing that would change things is if Intel persuades Microsoft/Sony/Nintendo to use a Larrabee-esque GPU in their consoles. Honestly, I don’t know how many console game developers would be willing to basically write and support a software rasterizer that has to be optimized for a many-core scenario, and I have no idea how many of them could actually get better/faster results than with a more standard chip. But if they are able to do something substantial with it, it could start bleeding into PC products.

Well after reading those atricles about both APIs change and advances…I’m pretty happy and not disappointed at all about the OpenGL 3.0. The power of OpenGL is glBegin/glEnd pair!