The “best” way to do it would be not to do it at all.
However, if you absolutely must do this, for some reason, you’ll have to do some performance testing to determine the performance difference between passing these as per-vertex outputs (each vertex in the same triangle gets the same 3 positions, so you have to duplicate data) and accessing global memory. The former will require a geometry shader, which is not known for its performance. The latter will require accessing global memory, which is also not known for its performance. But at least in the latter case, the data for each triangle will be quickly cached.
And yes, gl_PrimitiveID will remain the same for each of the resulting triangles in clipping operations. To the degree that geometry ever gets clipped, of course.
However, as I keep pointing out, none of the things you’re trying to do requires what you’re asking for. Let’s look at them:
That is texture projection, which as previously stated does not require fragments to access their individual triangle’s vertices. Just because a texture represents light intensities instead of diffuse surface reflectance doesn’t stop it from being a texture.
This is also texture projection, which as previously stated does not require fragments to access their individual triangle’s vertices. Again, just because a texture represents whether light is occluded at a particular fragment doesn’t stop it from being a texture.
Please stop thinking of textures as pictures. It’s 2015, not 1995. And even in 1995, they had textures that represented illumination (light maps).
Any “attributes” of interest can be passed as per-vertex parameters, and thus interpolated across the surface (with [var]flat[/var] where appropriate) and provided as per-fragment inputs. After all, an “orientation” value is probably something you want to be interpolated across a surface, not confined to each individual triangle. Not unless you want to create a very discontinuous effect (in which case, you can do that also purely with mesh data).
Well, now you’re talking about a complete different kind of rendering. The only way for a fragment shader to do any kind of meaningful raytracing is for it to be accessing the (scene) mesh itself as a while. At which point, the primitive you render has no relation to the object being raytraced; it’s just a thing you have to do to get the rasterizer to execute your fragment shader (which nowadays could mostly be handled by a compute shader. Unless you need the per-sample processing).
So there’s no correlation between the specific primitive you rendered and any particular location on the mesh you’re raytracing into. You’re rendering an imposter, and the FS doesn’t care what the imposter’s vertices are. It’s not like you’re raytracing the imposter object; you’re raytracing a scene.
I see nothing in that algorithm that requires fragments to have the vertices that generate them. The reasons those guys render them that way, in multiple rendering calls, are because they don’t want to:
Have the shader used for the primary rendering of a surface be responsible for also rendering the decals. Which your suggested algorithm would have to do.
Suffer the massive performance hit of having a shader loop through a number of decals for fragments, even if there is no decal anywhere near the object, just to project that point and find out that none of the decals have an effect. Which again, your suggested algorithm would have to do.
It really has nothing to do with the fragment shader not having access to the geometry.
If that’s true, then performance is very relevant to you. As such, rendering twice with two relatively cheap shaders will likely be much better for performance than rendering once while accessing global memory or shoving 9 32-bit floats at the FS.
The main issue you might have is with depth buffering.