If multisample rasterization is properly enabled, stencil ops (test/write/etc.) occur per-sample, not per fragment.
This means that in the multisample render targets being rendered to, there is stencil storage allocated per sample, not just per texel.
To your other question, the stencil test may happen either before or after the fragment shader. Up to the driver. So long as the behavior exhibited corresponds to that defined in the OpenGL spec. In some cases, drivers may run the test before to try to “kill off” whole groups of fragments for efficiency, so that there’s no need to dispatch fragment shaders for them at all.
You can read more about this all here in the OpenGL wiki: