Generic metadata extension

mhalle · January 9, 2020, 3:49pm

My application for glTF (digital anatomic atlases) requires binding application-specific metadata to glTF elements. I would like to do that in a robust way. As I’ve been thinking about the problem, it seems that a generic approach to handling metadata would benefit different kinds of applications.

It could in many cases negate the need for application-specific metadata extensions such as AGI_stk_metadata.

Goals

Top-level goals of such an extension include:

binding metadata to arbitrary glTF elements, the gtlTF file itself, and the metadata itself (for authoring and versioning purposes)
ability for metadata to survive most best-practices round-trip editing of glTF
ability for multiple applications to store metadata without stomping on each other. For instance, I might have an anatomy model with all sorts of labels on nodes and scenes, but I might edit it in Blender, which might add rendering-specific metadata to the same elements.
able to assign IDs to elements that can be addressed externally.
able to support bind standard JSON types to glTF elements.
support for “ref” values, which are URLs and treated as such.
contain optional reference back to a schema that describes the metadata

Non-Goal

Out of scope is metadata below the glTF element level (e.g., no per vertex or per triangle metadata).

I’ve got a strawman draft design I can share that starts to address these goals, but I had a basic question or two first

First, has any attempt been made to create such an general purpose extension?

Binding questions

Then, what’s the best way to bind metadata to elements? Seems like there are three ways to do it.

First, containment: put the metadata in the element itself. This might be hard for applications to deal with both rendering and metadata.

Second, you can bind by reference (metadata has a reference to the element). And there are two ways to do that: by index or using the “name” field.

The “name” field seems a underspecified in glTF; perhaps a human-readable name, or maybe an ID. AGI_stk_metadata treats it like an ID. Nothing else in glTF uses “name” as a way to reference nodes, doing so in a standard would require building a global index, and there could be name clashes.

By index could look something like this:

"EXT_metadata" : {
	...
	"meshes": {
		"0": { ... metadata for mesh 0 ... },
		...
	},
	"nodes": {
		"100": { ... metadata for node 100 ... },
	}
	... and so on ...
}

This scheme is unambiguous and can address any glTF top-level element. It is potentially brittle if elements move around during editing (although the scheme above would allow a naive application to renumber the metadata without knowing about its contents).

Once some of these core questions are answered, most of the rest is naming and engineering, I think.

Thanks for any insights the community may have!

ardenpm · January 11, 2020, 12:17am

Would the extras property already available in the specification on any node not allow for this without another extension?

Obviously if you want to enforce a particular structure for the data and standardise it then an extension would be appropriate but if you just want to store application specific information that only your tool uses and gets preserved by other tools I would think this would be suitable.

mhalle · January 11, 2020, 12:54am

Thanks for the idea. I’ve got a couple of possible concerns about extras.

If different applications process a glTF file, they might have their own metadata they want to store on elements. See my example in OP, blender binds some render-specific metadata and my app puts some domain-specific metadata.

In extras, these different uses can stomp on each other, at least without some convention.

This other issue, and this is stylistic I guess, is that application-specific metadata is spread around the whole file. I think it may force metadata readers to necessarily have to know a lot more about glTF structure. Separation of concerns, in a way.

The strength of “extras” is that it is close to the elements it is modifying, so no pointers are needed.

Have to think on it…

ardenpm · January 11, 2020, 4:34am

The stomping is a good point. It is unfortunate that the specification isn’t stricter in this respect. Since while it says best practice is to use an object, it isn’t mandated. If it were always an object at least applications that didn’t use conflicting keys could keep their data there. As it stands you are right though, multiple applications could definitely tread on each others data.

javagl · January 11, 2020, 1:35pm

The goal of associating additional data with the glTF elements already came up in various forms. A while ago, a (rather sophisticated) approach for handling metadata in the extras attribute was discussed in Data Visualization Design Pattern Leveraging 'extras' Attribute . This mainly referred to metadata for visualization, but is applicable more generically.

One thing that I talked about (in the thread, and via mail with the author) is whether the actual metadata should be stored in the asset. One has to keep in mind that glTF is supposed to be a transmission format. So one should be careful to not throw “too much, arbitrary” data into the glTF JSON. Roughly speaking: When there is an asset with 100 meshes, and each mesh has associated properties…

mesh: { 
    extras: {
        color: ...,
        name: ...;
        source: ...;
        description: ...;
        + 100's more...
        ...
    }
}

then at some point, one should break this down into a

mesh: { 
    extras: {
        id: 1234;
    }
}

and store the actual metadata in a way that allows accessing it via the ID (which may even be a REST endpoint queried at runtime).

I’ve seen that you also mentioned things like an ID, indices, or the name element, but I’d still have to read this proposal more thoroughly to see whether there’s an actual overlap. I just wanted to add this pointer here.

mhalle · January 11, 2020, 2:15pm

Thanks for the pointer. Good points.

If we could standardize on an “id” extra (or eventually a first class field that is part of the standard), that would be great. The convention would be ids would need to be unique across all elements, it should be preserved by clients, and they should have format restrictions so they could be used as URL anchors.

The only funny thing about building this ID mechanism is that it’s parallel to the the indexed-based mechanism of the rest of glTF. There would be two ways of addressing things. In an alternate reality, glTF elements would have been dictionaries with ids for keys, rather than lists.

We could still separately standardize on a metadata format, but it could optionally live in application-specific sidecars referenced by a URL, just like other glTF assets.

javagl · January 11, 2020, 8:22pm

An aside/fun fact: This “alternate reality” did exist. It was called “glTF 1.0”

^{(Some of the rationales behind changing this in glTF 2.0 is explained in glTF 2.0 syntax changes and JSON encoding restrictions · Issue #831 · KhronosGroup/glTF · GitHub (just to have this pointer here). I agree that using indices makes some aspects (specifically, everything that is related to modifying an existing asset) much more complicated. But in the end, again: glTF is primarily intended to be consumed, efficiently, and in this case this comes at the cost of making creating/modifying assets a tad more difficult)}

I wonder how much could actually be specified when the attempt is to specify “arbitrary metadata”. This is highly application-specific, and I don’t see which restrictions could there be added at all. But I could imagine some very simple concept for ~“defining additional resources”. This could possibly be boiled down to assigning IDs to the elements, and describing a rule for looking up the metadata based on the IDs. Some vague thoughts:

Each element may have an ID.
The ID must consist of alphanumeric characters (ASCII, preferably)
There is information about the point where the metadata can be obtained:
- From the file itself: This would mean that there is some plain dictionary in the file, roughly like
```
 metadata {
   id0 : { ... }
   id1 : { ... }
 }
```
- From some REST endpoint/CDN. This would mean that there is some baseUrl: ... to which the ID can be appended, and which returns JSON
- From some local resource. This could mean that there is some fileName of a JSON file that contains metadata

One could even consider the option to define metadata that points along one path in an external JSON.

Going one step further, this concept could be described so generically that it might even be possible to cover JSON refer other JSON? · Issue #37 · KhronosGroup/glTF · GitHub … (which is something that people have asked about repeatedly…)

But again: For more (substantial) on-topic comments, I have to re-read your proposal and compare it in more detail to the other proposal and the discussion that we had around this…

ardenpm · January 13, 2020, 11:15pm

Personally I would rather see the metadata at the node level than gathered all together. However if a metadata extension were just a generic holder to give you somewhere to put data you could then structure it the way you suggest underneath your own metadata. So something like this.

{
    "materials": [
        {
            "name": "wood_spruce",
            "pbrMetallicRoughness": {
                "baseColorFactor": [ 0.0, 0.0, 0.0, 1.0 ],
                "metallicFactor": 0.0
            },
            "extensions": {
                "EXT_metadata": {
                    "MIG_fancy_information": {
                        "display_name": "Wood (Spruce)"
                    }
                }
            }
        }
    ],
    "extensionsUsed": [
        "EXT_metadata"
    ]
}

So the extension would be a trivial one, it basically specifies an object where the named property is the metedata owner (in this case MIG_fancy_information) and then that can have any value you want, object, array, number, boolean, string (well also null I guess, it is just JSON).

The structure of that metadata would be up to the vendor. Then any more generic property names (e.g., visualDatum) would be used under that vendor specific property.

Shared or generic metadata could be agreed in the same way that extension names and structure is agreed now (essentially ad-hoc but by convention and with a registry).

If you then wanted to build a piece of metadata that holds metadata which references other nodes in the glTF somehow you could do it, though as you say, that’s likely to be pretty brittle.

Is there already a relevant Github issue discussing similar topics? There seem to be a couple possibly related ones.

simple extension for geometry metadata (1478)
Using the “node.extras” property to serialize game entity component data (1387)

There is a little be of talk in the glTF roadmap (1051) issue as well.

If only extras were mandated to be an object rather than recommended I think it would have been possible to handle this in there without things clobbering each other.

system · July 14, 2020, 11:15pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.