Implicit early Z/Stencil and discard

It is often made clear that one should be wary of discarding in a fragment shader (not to mention modifying Z) if one desires to leverage the early rejection mechanisms in hardware. The explanations as to why are good, but I am unsure about how certain conditions may affect it. Case in point, if we have set up the depth and stencil buffers beforehand, can we then proceed to run a shader using read-only stencil/Z rejection and shader discarding in tandem to reduce the shader work as much as ideally possible? From what I gathered around the internet, these things are plagued by driver and hardware idiosyncrasies, and so ascertaining anything is nigh impossible
This is with OpenGL 3 in mind, which seems to lack that extension needed for explicit control.

The rule is very simple: the presence of the keyword discard. If it’s there, then on lots of hardware, the shader cannot perform early depth tests. Not unless the compiler can statically determine that the statement will never be executed (and even then, I wouldn’t count on it).

I wouldn’t count on that being a performance win, not unless you manually force early depth tests. And if you do manually force early depth tests, any discarded fragments will still be considered to have passed the depth test. Since you’re not doing depth writes, that’s fine, but at the same time, the occlusion query counter will still be updated.

Um… why? Why are you limited to GL 3.x? Is anyone still supporting GL 3.x-class hardware with drivers? And if not, then the implementation probably won’t be very reliable.

It’s what this very laptop I am on can handle, and being slow, it needs optimizations more than anything.
Perhaps any form of branching isn’t the best fit for such a system anyway, so discarding discard is probably not that big of a deal ultimately.