I recall reading that using vertex buffers incurs an overhead that prevents them from being useful with simple geometry and individual small objects. If this is the case, is it better to transform many small objects in software and batch them in one call, or is there a different approach that effectively utilizes hardware?
Add an integer vertex attribute which identifies the object to which each vertex belongs. Use this to index into a uniform array of transformations in the vertex shader. Use a UBO, SSBO or texture if the array is too large for a default-block uniform variable.