Efficient multi point light shadow volume rendering

I have implemented a shadow volume technique with deferred rendering and the results are quite pleasing but I have slight feeling that what I’m doing is not exactly well, let’s just say that the most elegant solution.
So I have 3 render passes. First is a std geometry pass. has its own fbo rendering into the normal,albedo,depth textures.
Then in the second pass I render the shadows for each point light source that are intersecting with the occluder objects(1 drawcall per light). And third comes the light pass now into the “default” fbo. Now because for the lighting pass I use an instanced bounding sphere I only issue 1 drawcall , but that means I can’t use the stencil buffer to cull fragments in shadow thus I use a big stencil shadow texture (in this example 3840x2160 twice the screen res so all lights have their own shadows offset into this big 8 bit stencil texture). No problem there but because I need the depth information from the geometry buffer in the shadow pass to determine via stencil ops the shadow volume and geometry intersections I don’t have a fragment shader, everything is done in the raster stage. And now finally the problem is that I need to do a glCopyImageSubData operation after every shadow drawcall because I couldn’t attach the big texture as a stencil buffer and the depth buffer in the same fbo and render directly into the big texture in fact I need to use the detph24stencil8 packed format even though I don’t need an 8 bit stencil I could very much live with a 4 or 6 bit buffer. Seems rather inefficient to do a copy op at full 1920*1080 for every light. Also I cant really seem to figure out how to render the shadows at a lower res.

for recap here is the render loop:

while(true) {

     //no fbo bind here
     foreach(light i : visiblyLightsOnScreen) {
          glCopyImageSubData(from GbufferDepth24Stencil8 to big stencil8 tex ...offsetx,offsety...); //fine for 3 lights but not for 50 seems like, unnecessary data movement

     renderLights(); //sampling from big stencil8