So how does the perspective projection matrix works now in GL4?

I don’t know what actual hardware implementations use, but the “naive” approach is:

For each clip plane, calculate the distance of the vertex inside the plane. E.g. for the x>w half-space, the plane is x-w=0 and the distance of a vertex (x,y,z,w) to the plane is x-w (positive values are inside the view volume, and we don’t care about the scale factor, only relative values).

If two adjacent vertices are on opposite sides of the plane, the plane will intersect the edge between them in the ratio d1/(d1-d2) to -d2/(d1-d2) where d1 is is the distance of the interior vertex and d2 is the distance of the exterior vertex (which will be negative). So you would generate the new vertex by interpolating v1*(1-t)+v2*t where t=d1/(d1-d2) and 1-t=-d2/(d1-d2). All of the attributes would be interpolated in this manner.

Clipping may introduce more vertices than it removes (e.g. if a single polygon vertex is outside, clipping will remove that vertex but add a new one for each of the two adjacent edges, resulting in a net gain of one vertex), so clipped triangles won’t necessarily be triangles (however, they will be planar and all attribute mappings will remain unchanged, so it may not be necessary to tessellate the clipped primitive into triangles).

Output of what? The specification contains sufficient detail to determine what should actually end up in the framebuffer (and anything else which is visible to the client in some way).

Clipping is only “visible” insofar as it determines the set of fragments which are generated. Any vertices generated by clipping (if the implementation actually generates them) aren’t visible; clipping occurs after the vertex, tessellation and geometry shaders have completed, so doesn’t affect their operation. The fragment shader only sees data for individual fragments. OpenGL only specifies which fragments should be generated, not how.

I don’t know.