Many thanks Dark Photon for the clarification!
What you say makes totally sense, I think the distinction between incoming fragment and final storage was really the missing part in my understanding.
Then it makes sense that it has “SAMPLES_ARB depth values” but only one color value because it needs these depth values later when writing into the covered subsamples in the final buffer but it only needs one color value for them.