Performance: one channel per sampler vs 4 channels at once

I need to pass four channels to a model shader: metallic, roughness, emission, and a game-specific mask.
Almost all models have one or two of these channels completely black. Some other models need (for example) a hi-res metallic and roughness, and low-res emission.
Scene may contain 300-600 different non-instanced models.

Storing the maps as four different texture objects (and binding to 4 samplers) will save gpu memory and caches. If we neglect the overhead on binding, can separately-stored textures perform better than a single four-channel texture?

Does sampling different samplers leads to more cache misses than using one sampler with unused channels?

I’m targeting desktops with OpenGL >= 3.0.

In cases where you only need 1 or 2 channels (of equal resolution), you already have a viable solution. Make a 1 or 2 channel texture and use texture swizzling to apply the red/green channel to the appropriate output component. If 0 is a valid default value, then have the other channels

Note that this won’t work for 3 channels, because that’s not a valid thing. I mean yes, you can create 3 channel textures, but all implementations will transparently add a 4th channel to make the alignment work.

As for the rest (wanting different resolutions), this is going to have to be a memory vs. performance tradeoff. You can save memory at the cost of having more memory fetches by each shader invocation. And while caching will definitely help, a single fetch is likely to be less costly than multiple fetches.

Though as always, profiling should be the final determiner.

Personally, I would say to try the above tactic for 1/2 channel cases and just live with the “wasted” memory in other cases as a first pass implementation. If you later find that you absolutely cannot live with using more memory, then you will need to explore alternatives.