How to improve the performance of blend mask

Hello everyone,I want to implement a mask container.It means that the children in mask container will be masked by mask container.
the following site:
shows what I want to do.I have followed the step showed by site ,it does can do it.
But I encountered a problem:
my OpenGL environment is: VxWorks OS, ALT OpenGL driver(OpenGL version is 1.2.1)
glBlendFuncSeparate() is not available until OpenGL version 1.4, so I exchanged this func into glBlendFuncSeparateEXT() that can be found in ALT OpenGL driver.It also can work.But it is too slow! The animation is not continuous.
How can I improve the performance of it? Or Is there a better way to implement the mask?
I’m sorry for my bad English,I’m Chinese.