I am working on my first opengl implementation and image processing is pretty new to me.
I would like to take a photo of some text and make the text easier to read. The tricky part is that the initial photo may have dark regions as well as light regions and I want the opengl function to enhance the text in all these regions.
Here is an example. On top is the original image. On bottom is the processed image.
At the moment, the processed image only picks up some of the text, not all the text. The original algorithm I used was pretty simple: - sample 8 pixels around the current pixel (pixels about 4-5 distant away seem to work best) - figure out the lightest and darkest pixels from this sample - if the current pixel is closer to the darkest threshold, then make black, and vice versa
This seemed to work very well for around text, but when it came to non-text, then it provided a very noisy image (even when I provided an initial rejection threshold)
I modified this algorithm to assume that text was always close to black. This provided the bottom image above, but once again I am not able to pull out all the text features I want.
I am sure this problem has been solved many times before. Any suggestions?
I am sure this problem has been solved many times before.
I wouldn’t be so sure.
First, OpenGL is not really the go-to API for doing serious image processing. The kind of image processing you’re talking about is typically done on the CPU. If performance is paramount, and the algorithm works reasonably well on the GPU, then it can later be ported to the GPU. But even then, it may well be ported to OpenCL or CUDA, rather than an OpenGL-based algorithm.
Second, the kind of process you want to do, enhancing the text in an image, is not trivial or simple. Your current algorithm doesn’t even enhance text; it enhances the dark places, on the assumption that dark = text. When that assumption fails, your algorithm doesn’t work.
To do text enhancement, you really first need text recognition. And that is well outside the bounds of normal image processing. You have to be able to distinguish silhouettes, figure out which silhouettes represent glyphs or parts of glyphs, and then figure out where the rest of the glyph is and enhance it. That’s hard.
I would suggest asking experts in the field of image processing. While OpenGL programmers and image processing experts do have some overlap, it’s probably not a huge one. Look for where image processing experts hang out and ask there; you’ll be more likely to get a good answer.
Try using OpenCV.
For thresholding you can implement through shader programming, but as put by previous poster, complicated processing work should be left out to CUDA, openCL.
Hi, thanks for your responses. Perhaps identifying specific text at this stage is several steps too far from what I want to do.
At this stage I am happy for it to be purely a “smart thresholding” which enhances contrasting details (such as text) based on a localised sample.
I am implementing this on an iPhone using OpenGL ES, and I want this to work in real time, hence was experimenting with this framework on the GPU.
I have added in a better example picture of what is happening. I am able to enhance the text, but in areas where I have no text, this simple thresholding is creating speckled noise (image bottom left).
If I wind back the threshold, then I lose the text in the darker region (bottom right).
I can suggest for adaptive thresholding but again this is not related to openGL.
i.e in the shadow region change your threshold value. Your photo is not a binary image but grey scale so you will have to play with values.
To get rid of speckles, you can smooth out regions where text is not present.
thanks for your help.
In the end I went for quite a basic approach.
Taking a sample of 8 nearby pixels, determining the max and min. Determined the local threshold (max - min). Then
smooth = dot(vec3(1.0/3.0), smoothstep(currentMin, currentMax, p11).rgb);
smooth = (localthreshold < threshold) ? 1.0 : smooth;
return vec4(smooth, smooth, smooth, 1);
This does not show me the text nicely in both the dark and light region, which is the ideal, but it nicely cleans up the text in the lighter region.