Distributed Implementations and Multiple Error Flags

thokra · October 14, 2013, 4:04am

Yay, I’m a beginner again!

While looking at some error checking code I found that a developer had written a loop to clear all possible error flags. This assumes that the current implementation uses more than one error flag and the loop to clear them is set to a ridiculously high number. First of all, is anyone aware of an implementation (that matters) which implements more than one error flag per context? If so, what is the actual benefit? I mean, you get the error codes in an implementation defined order and cannot reliably attribute the error to anything anyway.

Also, the GL 4.4 (or possibly a much earlier) spec dropped the advice that errors should be queried in a loop if one wanted to clear all the flags. Or has that advice always been an addendum to the man pages?

What’s the deal with this?

GClements · October 14, 2013, 5:19am

The comment about using a loop isn’t in the 1.5 spec either.

The key point is that a single call to glGetError() isn’t guaranteed to reset the error state (i.e. a subsequent call isn’t guaranteed to return GL_NO_ERROR).

Beyond that, I think that it’s trying to say that the first error to occur will be among the reported errors. I.e. that a non-distributed implementation will report the first error and a distributed implementation will report the first error for each unit, one of which will actually be the first error to occur.

So e.g. if a state-change command fails with GL_INVALID_ENUM, and this causes a subsequent command to fail with GL_INVALID_OPERATION because of the incorrect state, the GL_INVALID_ENUM will get reported at some point in the next glGetError() loop; you won’t just see a solitary GL_INVALID_OPERATION which couldn’t possibly have happened in the absence of some earlier error.

If this wasn’t intended, they could have just made glGetError() return one of the errors at random then clear all error flags. But then the error code would be essentially meaningless; you may as well just have a binary “errors have occurred” flag.

If the intent is to always report the first error, then the specified behaviour makes sense. With a distributed implementation, determining precisely which error (of several) occurred first could present significant issues for complexity and/or performance, and the specified behaviour represents the next best thing.

Aleksandar · October 14, 2013, 12:54pm

We are all, but it is sometimes hard to admit.

I really don’t understand what flags are you talking about. May you give us some example?

if GClements is right and you are talking about glGetError() mechanism, then every call will reset error code. In that case, your programmer maybe wanted to catch the point in which error occurred. But it is a prehistoric method. glGetError() mechanism is not really useful. At least it is my humble opinion. From the first implementation of debug_output I have started to use it and never regretted. I have just extended it by adding “current function” as a context for error reporting in post-execution analysis (since all errors are written in the error-log file). During debugging, call stack (along with debug_output) is the ultimate weapon to hunt errors. And there is a single callback function for error collecting.

thokra · October 15, 2013, 7:54am

But isn’t that exactly what the spec says?

This sounds an awful lot like “I will only record one error util GetError() is called - then and only then will I record another error”.

Also, I always assumed all major implementation are not distributed. Is that valid?

Aleksandar: I’d love to use debug output (or GL_ARB_debug_output for that matter) but unfortunately we’re stuck with GL 2.0 or 2.1 (TBD ) at the moment.

Aleksandar · October 15, 2013, 12:47pm

Maybe you could use debug output eventually. Most of the errors (if not all) should be caught during developing and testing processes. If testing and/or developing machines are not confined to old drivers, you could use debug output to significantly increase errors locating. In the release you could remove all those specific calls simply by #ifdef/#ifndef directives (or something similar if C/C++ is not used).

GClements · October 15, 2013, 7:42pm

What the spec says is:

The command

enum GetError( void );

is used to obtain error information. Each detectable error is assigned a numeric code. When an error is detected, a flag is set and the code is recorded. Further errors, if they occur, do not affect this recorded code. When GetError is called, the code is returned and the flag is cleared, so that a further error will again record its code. If a call to GetError returns NO_ERROR, then there has been no detectable error since the last call to GetError (or since the GL was initialized).

To allow for distributed implementations, there may be several flag-code pairs. In this case, after a call to GetError returns a value other than NO_ERROR each subsequent call returns the non-zero code of a distinct flag-code pair (in unspecified order), until all non-NO_ERROR codes have been returned. When there are no more non-NO_ERROR error codes, all flags are reset. This scheme requires some positive number of pairs of a flag bit and an integer. The initial state of all flags is cleared and the initial value of all codes is NO_ERROR.

The wording is identical in 1.5 and 4.4, so I would assume that it’s the same in all versions in between.

Also, it doesn’t say that the implementation must actually be “distributed” to behave like this, or even what that term means.

Sort of, except that it’s potentially one error per “unit”, which isn’t defined, and there’s no glGet() to determine the number of units. So in practical terms, it will record the first error, plus some (effectively random, possibly empty) collection of additional errors. When you call glGetError() in a loop until it returns GL_NO_ERROR, it will yield the collected errors in no particular order. After which, the next error to occur (“next” in terms of the sequence in which the GL commands are issued, not necessarily executed) will be among those reported by the next “batch” of glGetError() calls.

I suppose that would depend upon your definition of “major”. I would assume that the language about distributed implementations was included for a reason. The most likely candidate would be the SGI systems where the geometry engine was a bank of i860s, but that’s not what most developers would consider a “major” platform nowadays.

As for typical PC hardware … I have no idea. I could conceive of implementations having separate CPU-side and GPU-side error flags. If it’s possible for errors to be flagged by both the CPU and GPU, then there’s no reason for the implementation to concern itself with which one happened first, given that the spec explicitly doesn’t require this.