Talk about your applications.

Hello frequent posters…

I’d be interested to learn details about the kinds of OpenGL applications you work on. Motivation here is to get more insight about the features you need in OpenGL which can have a big effect on scheduling and priorities for upcoming releases.

Kicking things off, here are the applications whose development I’m involved with at Blizzard - followed by some comments about features we need in OpenGL to further improve them.

World of Warcraft (supports OpenGL on Windows and MacOS X)

StarCraft II (OpenGL on OS X)

Diablo III (OpenGL on OS X)

Probably one of our highest priority needs for improvement in OpenGL is to streamline the interface for communicating with GLSL shaders, specifically the rate at which uniform values can be set en-masse, in situations encompassing many individually named uniforms, not suitable for packing into an array such as could be driven by the bindable-uniform extension.

On the latter two titles, the engine can run in either ARB assembler mode or GLSL mode, the ARB path has a high performance API for updating uniforms in batches, this isn’t the case yet for GLSL. Since we’d like to use the richer semantics of GLSL and run fast, this is one issue that is high priority for our games and not currently addressed by OpenGL 3.0.

Another one not currently covered by GL3 is a compiled shader caching capability to help improve load times.

GL3 and the new ARB extensions for 2.x do bring an improved feature set for games such as MapBufferRange, the enhanced ARB_framebuffer_object spec, instanced rendering, explicit support for one and two channel textures, and a few other things that we are looking forward to using as the implementations come along.

(more items and details as I think of them)

Looking forward to learning more about the apps the rest of you are involved with, and what we could do in post GL3 releases to help you improve them.

Rob, if you’re content to use float4’s for all input parameters, why wouldn’t bindable-uniform be sufficient?

Your program could just include a #define to convert the name you want to use in your shader to the element of the bindable array.

This is what I’ve been planning to do, and I’m much happier with that approach than to try to negotiate tons of named variables with the GL driver.

The app will know the offset into the buffer object for a named variable and updates to the named variable from app code just turn into glBufferSubData() calls. The shader just references at a hardcoded location in the bindable due to the define, but the code itself still looks like it’s using a well-named variable.

I don’t see the hand-off getting much more efficient than that between the app and the driver/hardware.

I will post here later, when I get a free day. Busy working… :frowning:

Speaking of bindables…

Is there any hope for a user specified packing when the extension is made core?

Failing that, I like your solution, Cass.

Damnit I need to finish grad school so I can get an actual job programming in OpenGL. Its not fair to listen to all you guys talk about the bad@ss work you get to do :frowning:

Don’t forget, when you post in the thread, you should mention what apps you are working on (or maybe would like to work on).

I’m working on a 3D modeler, aimed for games/realtime-vis. (unlike how gamedevs always have to make conversion+preview+stuff tools for their artists). Drawing is all shader-based, using precompiled Cg or GLSL1.2 (the cgc.exe way), shader-model4. The artist creates/edits the shaders inside the modeler, and there is C++ code for the application-side for each shader (it takes care of setting uniforms, i.e a GetTickCount()&1023 controlling a UV-animated surface) . The C++ code has a call-table of entries to the rendering engine, and is recompiled+linked with GCC/whatever into a dll for use in the modeler, later simply included into a game-project.

  • Semantics (defining which resource a uniform/attrib/varying uses) are a-must.
  • Uploading of a whole range of uniforms is a-must.
  • bindable-uniforms are increasingly useful
  • texture-arrays are used
  • gl_Modelview** and other predefined uniforms are avoided.
  • instancing via an instance-specific vtx attribute-value is used

I don’t like GLSL1.3’s in/out stuff, “varying” wasn’t broken. (well, I can always make preprocessor macros to define how definitions look). I hope ATi let us use ASM shaders for their upcoming GS support, I don’t like GLSL1.3 in its current state. (and I hate letting compilers/linkers decide where I should be sending data)

VAO would be really nice to have.

P.S. That modeler will be free or opensource, Blender sucks, all worthy modelers have too much useless-for-RT fluff and cost an arm and a leg for that.

I’m currently working on a game which makes extensive use of PhysX for animation, with lots of softbodies and cloth, in a very large outdoor world.
The MAP_UNSYNCHRONIZED_BIT has been very handy for streaming the results of the PhysX simulation into a VBO circular buffer, however our implimentation using GL_NV_FENCE is more efficient and requires a smaller VBO than the one without, so:

#1. Promote GL_NV_FENCE to core.

But what would be even better (when NVIDIA add GPU processing to PhysX) would be to avoid the GPU-CPU-GPU copy and have PhysX store its output in an OpenCL style buffer on the GPU that can then be directly used as input to the OpenGL tesselator.

#2. Shareable PhysX/OpenCL/OpenGL buffers.
#3. Tesselation !!! (with distance related silhouette edge LOD)

I agree with you on:
#4. Compiled shader caching

The biggest single problem i have at the moment is the lack of control over the rendering of secondary tasks like shadow maps or environment cube maps.
I can always complete the rendering to the frame buffer before the VSync, however when i add the shadow map and cube map rendering it occasionally skips a frame.
This is visually jarring, whereas updating the shadows or reflections every 2nd or 3rd frame is hardly noticable.
So what i need is a second rendering stream (preferably in a separate Thread/Context) which acts like a lower priority background task and renders to a FBO.
The GPU would render my main stream until it reaches the swapbuffers, at which point it switches to rendering the shadow and cube maps, then at the next VSync it swaps the buffers and continues with the main command stream.
Hence the main rendering occurs every frame and the shadows & reflections are updated as often as possible.

#5. Prioritised render contexts.
#6. A query of whether the last swapbuffers had to skip a frame.

Primarily I work on a versatile 3d engine for games & applications. Focus on rapid tooling/prototyping for games & serious work, hence luxinia uses Lua mostly (core engine written in C). So far Windows only although Linux & Mac are intended as well.

For the game use of the engine the focus is mostly for “mass use”, where OpenGL clearly isn’t a winner considering how problematic drivers are, and how certain features (draw_instanced, hfloat vertex,… come now only on new hardware, while exposed on older hw longer for dx).

Mostly missing the parameter uploads & precompilation as well. Wished ATI had at least gone to SM3 for the ARB programs like NV did, and Intel would have PBO support… Dont like GLSL, hence using Cg. Another wish would be IHVs submitting to “glview” and alike open databases, so we can see what limits/extensions are supported on hardware.
I second the NV_fence to core suggestion. Something that can allow non-busy waiting would be nice. Although I am just beginning to explore the stream stuff.

For research I work on virtual endoscopy (see you at IEEE Vis :wink: ) and other medical related rendering stuff (PhD). Hence I want to look further into CUDA for the raycasting, but so far I am fine with Cg & FBO. For rendering of vessel trees I will look into new ways as well (thanks for the reyes pipeline hint Mr. Farrar). Goal is that the rendering of a single effect doesn’t saturate the high-end cards, but leaves room for implementation within a greater app. Daily clinic routine vs “academic visionary” :wink: I can ignore non-Nvidia hardware here pretty much so I am glad that up-to-date solutions for most features exist.

in short:

“env” parameters for GLSL (in general GLSL seemed like a step-back from certain low level features of the PROGs)


drivers also exposing appropriate functions for “older” generations, as they do support under dx9 (wishful thinking)


ecosystem wish: somesort of “official” glview/ former database, which IVS commit latest driver limits to, so that its much more complete than the “user” based commits.

First of all, I am still a beginner when it comes to programming OpenGL, so you might ignore this post if you think it is rubbish ;).

Applications I work on:

  • scientific: fluid simulations/other physics/ai calculations possible to implement on the gpu.

  • games: for the moment, this is purely hobby, but depending on success developing them it might become a job.

Currently I am starting a new project for a game, basically from scratch. I have been looking into depth peeling and I think it should be great to be able to check fragments against multiple depth buffers, so that the second closest fragment can easily be found. I am not sure this is a hardware limitation, or an API limitation. But it would be great if one was able to write/check multiple (not necessarily limited to two) depth buffers. As I said, I’m still a beginner in this field, so there might already be very reasonable alternatives for this. But so far I haven’t found them.

A second wish, which probably doesn’t even has to be mentioned: put geometry shaders in the core, so we can rely on them being implemented in vendors drivers.

I would like to see an easy way to create multiple offscreen contexts, one per GPU in a multi-GPU system.

I need to render a large number of frames (~300,000) of a dataset to disk or memory, and then preview this data as a 3D visualization. Both rendering time of the frames and resolution of the preview would benefit from independent control of GPUs. SLI is no help because what I really need is to split my dataset and readbacks across GPUs.

NVIDIA has their WGL_NV_gpu_affinity for Quadro cards, and there are tricks with sticking windows on different monitors, but there doesn’t seem to be a consistent way to do this for consumer hardware without a lot of fiddling and luck. It’d be nice to have an alternative to SLI for programs that could be multi-gpu aware (physics, procedural content, large datasets, …)

generally engineering visualisation, some volume rendering, lots of streaming stuff (geometry lods, modified geoclipmaps/megatexture type stuff). So MapBufferRange is a good addition. Haven’t bothered with the instancing stuff as engineering data rarely retains the instancing data, even though lots of it is effectively instanced.
i’ve just moved to deferred shading, so the binary blob thing isn’t really an issue any more.
I don’t use immediate mode, nor display lists any more. I don’t use any of the ff states, like light or fog etc.
Basically, I want a clean API with exactly one route for the application to communicate buffers to the driver, and therefore stable and reliable drivers. Everything else is just putting ear rings on a pig.

Ive asked this before but never got an answer,
why cant nvidia/amd etc release opengl es drivers?

OK you could argue fracturing the driver BUT the IMO the benefits are worth it.

A/ less driver bugs
B/ potentially better performance

Im sure this would please ppl like knackered and myself

Zed do you have a chance to post details of your app ? It may be common knowledge to some of the regulars here, but I am not familiar with your work yet.

Knackered, you can see the message in GL 3.0 with respect to data flow, VBO is the way to go and that’s been made pretty clear by virtue of the other paths which have been deprecated (marked for removal or relocation in future versions).

well VBO’s seem to be pretty difficult for drivers to manage efficiently. We’re still waiting for the index offset to make VBO’s a little more practical in terms of performance. I offset indices depending on where in the VBO cache their vertices are, to avoid having to rebind all the attributes every time I switch geometry (which kills performance). This is cumbersome and slow, especially when you’re streaming geometry.

You still haven’t answered zed’s question.

{edit relating to robs next post} yes rob, i suppose i understand why you want to keep this thread as a survey rather than discussion. These things do tend to escalate.

I think that would be a great topic for another thread. Is that OK by you ?

(i.e. referring to the desire for “ES on the desktop”)

I work on applications which mix simple 3d rendering with 2d operations done in software. I draw some triangles, then
glReadPixels, then do stuff in software, creating textures from the results and other imagery, eventually composing what becomes a visible frame. It would simplify things if OpenGL had more direct unencumbered access to the frame buffer than glReadPixels and glDrawPixels. Functions to query the native pixel format and use that instead of allowing you to specify all sorts of pixel transfer modes and states. glDrawPixels has a big diagram in the spec to describe all its doings, and actually I don’t use it, having made my own version of it which textures a single quad. glReadPixels doesn’t seem quite as bad. I could see no way to avoid it, and I’m not confident that it’s as fast as possible.

One thing to keep in mind is that, even if a direct mapping could be provided to your application to be able to read from the frame buffer, the pixels might not be in a convenient ordering for your code to deal with (i.e. simple raster order). i.e. in many cases the pixels in the FB are tiled/swizzled in an undocumented way.

On World of Warcraft we implemented an integrated video capture feature and it uses glReadPixels - but it uses multiple PBO’s as destinations for the reads, and round robin scheduling to enable pixel readbacks to happen without blocking the drawing thread. If you schedule things correctly, when the time comes to map the PBO and actually access the pixel data, it should not block. The trick is to map the one with the oldest pending read in flight each time. (This is all on OS X so YMMV).

I am working on 2 commercial platform games (for age under 12 ), planed to be release in Q2-Q3 2009. I am doing also some parts of the support tools. One of the goal for the company where i work is to make the games available for all platforms(Windows/Linux/MacOS). Unfortunately, the biggest problems we have are on cross platform part, especially on Linux.

We are using CG/CGFX for shaders.
We use “in house” formats for textures, characters and so on.

What i will like to see from the next OpenGL versions:

  • better buffers manipulation( i will honestly prefer a “clean” way to handle them)
  • C++ ( yeah… i am dreaming here… i know that will probable never happen… or if will happen, there will probable be a new api)
  • better cross platform integration ( i will love to see something like glMakeWindow(x,y,…) and do the OpenGL window Windows/Linux/MacOS, and some other things like that)
  • support tools (is impossible for me to don’t compare with DirectX SDK; also, DirectX have real support newsgroups, where developers actively respond to questions, everybody wins here)

I will probable miss the “immediate” mode in 3.01, i am using that for debug tools usually, to draw some lines, and i like it, but that’s just me.

For the extensions part, we dont use any “wicked” stuffs. We plan the games to be played on medium machines.

Anyway, i am confused about the OpenGL future. I agree, is the only way to do cross platform application, but sometimes, for the rest of 10% from the market(, i am asking myself if the amount of work put in that really justify. Not everybody have Blizzard financial and human resources to put on table. Right now, there are discussion to drop OpenGL or keep a hybrid.

I am also very curious what are the Khronos plans about OpenGL future.

I really appreciate Rob Barris interest as ARB member about OpenGL developers, please peoples stay on topic, will be interesting to see how many other are using OpenGL for commercial applications, and see their problems also.

I am an IT student specializing in CG. In my spare time i am working on a game engine, but lately i am nearly exclusively working on a project for university. The project is about taking existing data from the land-registry office, creating a 3D city model and visualizing it in real-time in the best quality that we can. So the visualization stuff is very similar to what games do.

Setting uniforms is a major bottleneck. The possibility to query a shader for all occuring uniforms (as added in GL3) would be really great. Right now i implement something similar myself, but it takes a lot of extra code.

FBOs are always real fun. We use deferred shading and it would be really helpful to finally mix different formats without any limitations. It would also be great, if we could somehow query whether early-z and early-stencil tests are currently enabled by the hardware, such that if the query returns false, we can take a closer look to try to fix it. Right now it can break any time, without notice, until you enable some effect and notice that it runs slow.

Another thing to improve are 1 and 2-channel textures, but it seems that is all underway already.

Don’t get me started about state-management. But well, we know the ARB is incapable of fixing that problem…

Custom resolves for multisampling would be great with deferred shading, right now we have to live with some artifacts when using FSAA.

That’s all i can think of atm.