GLTF is too complicated

I’m on the final parts of my GLTF loader, after working on it over a year (not continuously). I’m noticing that many importers have problems with the format. Blender for example will fail to load this model’s textures, though my own loader and Windows 3D Viewer succeed:

This means the format is too complicated. Maybe not for me, but if common applications can’t support it reliably (after how many years?) it affects my ability to recommend the format for model IO.

These elements should be considered for removal or streamlining in version 3.0:

  • Data from multiple files, other than textures.
  • Skins (completely unnecessary for weighted vertices)
  • Default bone pose should be weighting pose.
  • Buffer views / accessors / buffers design is extremely over-engineered.
  • Get rid of Base64 data, just use text or binary.
  • Flipped faces when matrix determinant < 0

I know somebody lovingly crafted the design of each of these items and thought they were being oh-so-clever, and that is the problem. A common file format should be as simple as possible while still supporting the requirements. For me the benefits were worth investing time, but I personally know people in charge of software products who won’t touch it for these reasons.

There is some point where complexity of the specification can be considered a bug, and effort should be made to fix that bug rather than trying to fix all the bugs in all the importers and exporters out there. I managed to wrangle the whole thing, but its complexity is holding back adoption and compromising the quality of its support.

Anyways, after working with the format extensively, that’s my $0.02.

I find this to not be particularly well reasoned. The specific piece of evidence you give for glTF being “too complicated” is that Blender3D failed to load the textures of a model correctly. But your suggestions for changes is to remove all external files except textures. So… how does that fix your problem?

It seems to me that a better set of reasoning would be to show specific areas that different glTF loaders are having problems with conformance, which would constitute reasonable motivation for removing them. But as it currently stands, your argument is that some loaders don’t quite work, so let’s make a bunch of changes that… don’t actually make those malfunctioning loaders work.

It would likely fix the problem, indirectly. There’s an upper limit to how much complexity people can handle. If the author of that import plugin was not putting a lot of effort into some of the things I mentioned, they would have more time to perfect the basic functionality.

I could compile a list of a lot of other issues in different software, but nobody is paying me to do that. The point is, on a macro level, the spec is too complicated for reliable implementation across the board in the real world. I like GLTF a lot, but if complexity can be reduced in the next major version it would result in better support.

You can’t go back in time and uncreate those features. Your suggestion is about what to do in the future, not the past. So we have to talk about a world where all of that effort has already happened (or failed to happen properly).

From a practical perspective, glTF v2 already exists and has a bunch of assets for it as is. Assets that people will expect something calling itself a glTF loader to be able to handle. So it’s not like tool writers can just drop compatibility with the older format for the forseeable future. So whatever work needs to be done to implement those features in a glTF loader will still need to be done.

Look at how effective the switch from Python 2 to Python 3 has been. Or rather, how effective it has not been. There are automated tools that can successfully translate a significant amount of Python 2 code to v3, but there is still a bunch of Python 2 that keeps getting written and maintained. Despite support for Python 2 being dropped this year.

glTF v2 is successful; there are a lot of loaders for it and a lot of assets for it. So, would people wholesale switch to your hypothetical glTF v3? If not, then how would such a change improve things? It could easily get us into a place where the two formats have to coexist, and therefore everyone needs to maintain code that can load both of them. That doesn’t actually solve the problem.

The best you might be able to do is create a “simplified” glTF v2 variant, which is just glTF v2 without certain features. This would allow such assets to be able to be loaded into a proper glTF v2 system, while also allowing for “simplified glTF loaders” that exclude files with those features. Essentially, Khronos be sanctioning the production of loaders that don’t handle all of glTF.

That cuts both ways. You don’t have to provide evidence for your position. But we don’t have to accept a position that lacks evidence.

If you want to get this to actually happen, you need to be able to make a convincing argument for it. And that means you need to demonstrate that 1) there is a problem and 2) this is the proper solution to it. Compiling a slate of evidence for these positions is a great way to do that.

After all, my point wasn’t that your suggested changes are bad; it’s that they are not well founded, that you aren’t providing a convincing argument along with evidence for these changes needing to be made.

If you want to proceed with pushing for this idea, I would suggest visiting the glTF GitHub repo and filing an issue. However, if you wish to do so, it would be far more effective if you took the time and effort to substantiate your positions with solid evidence.

2 Likes

Please report all issues to Blender’s tracker. It sounds like glTF is way simpler than COLLADA (XML by itself is insanely complex) so I wouldn’t worry about this too much!

For the most part I agree with @Alfonse_Reinheart — glTF 2.0 has quite a lot of adoption for such a new format, and this seems like a good sign that we’ve (mostly) got the balance between simplicity and feature coverage right.

@JoshKlint, I’ll comment point-by-point on your items. Let me start with the parts where we may agree:

  • Skins (completely unnecessary for weighted vertices)
  • Default bone pose should be weighting pose.

I’m not sure I understand your argument on these exact points, but we have gotten more detailed feedback (most recently #1665 and #1784) that skinning could be simplified, and are considering options. As @Alfonse_Reinheart points out, we have to strive for backward-compatibility too. One proposed solution (#1747) is to define an extension that — unlike most extensions — restricts the format further, simplifying implementation. Then your engine could choose to only support models that use the strict form of skinning, and authoring tools and optimizing tools could convert models to the strict form for best compatibility. This potentially provides a smoother path to 3.0, someday.

It’s also possible that clarifications to the existing spec are necessary.

As for your specific comment about Blender, that is more complicated. Blender’s representation of skinning, and animation in general, is significantly different from what a runtime would store. It took us a while to get skinning export in Blender working correctly — a few bugs remain (please report them if you find them!) — but far fewer than a year ago. I’m afraid that a stricter/simpler glTF skinning spec probably wouldn’t have helped much with this, the skinning data exported from Blender has to be reorganized regardless. There is a lot of work involved in creating a free format and the tooling ecosystem that supports it; filing issues and opening pull requests are helpful to us: https://github.com/KhronosGroup/glTF-Blender-IO/

  • Data from multiple files, other than textures.
  • Get rid of Base64 data, just use text or binary.
  • Flipped faces when matrix determinant < 0

I’m not aware of existing complaints on these features. We’ve tried to make it very easy to package files as you need (with or without separate files, or Base64) by providing tools like glTF-Pipeline. Feedback on all of this is certainly welcome; comments on GitHub may get quicker responses for technical issues.

  • Buffer views / accessors / buffers design is extremely over-engineered.

I’m afraid I just disagree on this. :slight_smile: It isn’t a structure you’ll find in exchange formats, for sure, but it has been very helpful for runtime use and incremental loading on the web.

2 Likes

Thanks for the reply. I like the format and this was meant to be constructive. It’s just that I have written loaders for perhaps a dozen different 3D model formats, some of them animated, and most took just a few days to implement. Anyways, my loader is finished, so it doesn’t matter that much to me at this point.

1 Like

Good Morning together,

I want to assure you that I’m writing this with full respect to you all.

This is why I finally registered and why I’m answering, to give no possibilities to say nobody has complained except for one person. No complaints does not mean there is no problem.

Deleting code is easier than writing it. For OpenGL they also went a huge step and removed the full fixed function pipeline, didn’t they?

This is the only thing I really dislike in the whole format, even for implementing it on the web. A “universal” format shouldn’t include all possibilities of data storage. These buffer views, accessors, buffers allow many rearrangements, i.e. you can choose blockwise or interleaved vertex attributes (“pos1,pos2,pos3,…uv1,uv2,uv3,…” vs. “pos1,uv1,pos2,uv2,…”) with byteStride handling etc. which is overkill nowadays in my opinion. In most cases you need to convert all these formats in your application’s own format anyway, but instead of converting from one universal format you convert it from all these possible formats into your own.

If you reduce the options to only one of these (either interleaved or blockwise), only users who prefer the other option have a disadvantage. However if you stick to one single option, at least one side does not have the disadvantage.

E.g. in a format, which contains strings coverable by several encodings, you would not store the encoder name with it, no you would usually simply choose an universal encoding like unicode.

Btw.: is interleaved buffer rendering still a thing nowadays?

Kind regards,
Daniel

Major breaking changes (like changing the accessor/bufferView/buffer layout) could be made in a possible future glTF 3.0 version, but will not be made in glTF 2.0 to avoid breaking the many tools that have already implemented the current specification. For that reason, feedback of this type can inform long-term roadmaps but not immediate changes — I wouldn’t expect a v3.0 of glTF to be created in the near future.

In most cases you need to convert all these formats in your application’s own format anyway…

Having written the implementation for https://threejs.org/, we upload these buffers directly — interleaved or otherwise. This is much more efficient than rearranging vertex by vertex, and is one of the things that makes glTF much faster to load at runtime than formats like FBX, for us.

Note that glTF is first and foremost a “runtime” transmission format. It must at least be possible to write data into glTF that optimized for runtime. That is not to say that every possible glTF data layout is optimized on the happy path for every engine — we can’t possibly guarantee that. This is why there are tools like gltfpack that take a glTF model and optimize it in more opinionated ways, while still outputting a valid glTF model.

I do recognize that these layouts require additional work to implement. If something like interleaving really becomes obsolete then it should be removed in future versions. But in my experience converting an interleaved array to separate arrays takes maybe 50 LOC (and none if your engine supports interleaving), and supporting sparse arrays might be another 25 LOC. I don’t see this as a fatal amount of complexity in a format, for the benefit of being able to store data that is optimized for runtime. If that’s still more complexity than you want to implement yourself, I would recommend using one of the available SDKs like glTF-Transform or cgltf. See http://github.khronos.org/glTF-Project-Explorer/.

Btw.: is interleaved buffer rendering still a thing nowadays?

I’ve never seen any claim that data locality is obsolete, it’s certainly still mentioned in OpenGL best practices. But I also haven’t done benchmarks here. Any reason to think it’s no longer important?

Good day,

Thank you for your answer. :slight_smile:

I understand that and I wouldn’t expect anything else.

For rendering only, this is true, you can simply throw this into OpenGL, but sometimes you still need to access the data on the CPU side as well. E.g. a physics engine would usually only need the plain vertex data and nothing more. Or computing coonvex hulls, spacial subdivision trees or other things need only vertex data and maybe normals. If you have interleaved buffers, much unneccessary stuff would be loaded into the cache.

Yes that might be true. However, you basically force everyone to do this, you could simply give at least one group of people an advantage by simply reducing the possibilities.

It was not meant to be a rhetorical or sarcastic question (people tend to get it that way when I write texts sometimes, thats why I clear it up right now), more just out of interest. I for my part didn’t use interleaved buffers since the fixed function pipeline was deprecated, and the fact that each vertex attribute in shaders basically means a different buffer was a practical advantage for me since then (e.g. for geometrical processing like above).

Btw.: sorry I had to remove all the links from the quotes, otherwise I would not be allowed to post my reply.

Kind regards and have a nice weekend,
Daniel

Sure. glTF is optimized for GPU rendering. If you want to use it for something else, then massage glTF into the restricted format/subset that your specific app or library needs, if necessary (e.g. a subset where interleaved attributes are expanded back out, and all but your favorite 2 are thrown away). That’s easy enough to do. Considerably easier than with some other 3D model formats.

No sense in trying to get the rest of the world to limit their usage to your domain/app-specific format and subset. Just reformat it yourself as a preprocess.

1 Like

In COLLADA everything is technically a link. The standard says you’re not meant to make the data accessor buffers link outside themselves, but they’re not restricted to not using URLs, so a proper implementation should support URLs I believe, plus not doing so takes away a lot of power and makes it impossible to share data, which is very often possible to do (why duplicate the data just to satisfy some standard?) so, at the end of the day I don’t know glTF but it’s probably more self-contained than this, so consider you’re lucky to not have to deal with HTML like semantics on top of everything else, and XML semantics which is even crazier if done to the XML specification.

I see the point, but in my opinion this is ignoring the fact that most of the applications are not only scene viewers which do not interact with the scene. Most applications are also interacting with them and not only by flying through them or rotating them around and then you need to access the geometry in one way, not in many ways.

And my reduction proposal of the format wouldn’t even limit the final rendering possibilities, the rendering output would still be the same. It is just a basic structure limitation, so from a holistic point of view it would reduce the overall conversion overhead, if you limit the structure to the way people use most often (e.g. make surveys?).