YUV to RGB color scaling design advice

I think I made a design flaw in my video processing pipeline, and it’s not a good time to correct it, but I wonder if I can do a nice workaround?

I’m passing in video frames as PBOs, using their native format, colorspace and color sub-sampling. So, for instance, I pass in a 1280 x 720 yuv 420 frame as a 1280 x 720 8-bit luminance plane, and two 640 x 360 planes (with padding to 32 byte multiples).

Then I use a shader to convert to a linear RGB half FBO, with a mild dither. Later I do a decent scaling on the linear pixels, then convert back to whatever I need to output.

The problem is that ‘color’ scaling gets poor attention, since I can’t scale the u and v planes right from their stored (quarter size) pixels to the destination. So I think I’m seeing artifacts - blockiness in rich colors.

The question is what scaling can I apply at the YUV > RGB part of the pipeline without softening things up? In my head, I can use the ‘cardinal sine’ or ‘sinc’ filter to upscale, this being a simple case, and I presume since I’m doing a fixed ratio upscale, I could use constants, but I’m not sure whether the scaling being in a video gamma space means I have to vary the scaling?

Anyway, as an example, if I have output pixels A-E, the contributing pixels would be:

A y0 u0 v0
B y1 u0 v0
C y2 u1 v1
D y3 u1 v1
E y4 u2 v2

So, taking pixel C, y2 is centered right on C, but u1 and v1 are centered between C and D, so I need to do a weighted sum of u0, u1 and maybe even u2? (say (0.75 * u0 * sincO) + (0.25 * u1 * sincC) where sincO and sincC are from the sinc curve?)

Does it even matter that when the original 4:4:4 to 4:2:0 conversion was made, they presumably only used the C and D pixels to generate the u1 value?

Hope that’s clear enough that someone can help me see the wood for the trees.

Bruce

Interesting, I’m about to do something similar.

What do you mean by can’t scale? It sounds like you’re doing nearest neighbour interpolation if you get blockiness.

The colour information will of course be soft compared to the luma no matter what you do, since it has much less information. Exactly how you scale image data is usually a matter of taste, unlike audio where the quality of the result is easier to quantify. Bicubic interpolation often gives better results than windowed sinc due to ringing artifacts, but it depends on your source material.

As for gamma, I’m not quite sure what the correct way is. The chroma signals aren’t properly gamma corrected, but not linear either. Probably they are just treated as linear during both downscale and upscale. If someone konws more about this, please speak up. :slight_smile:

Downscaling properly involves a more advanced filter, but sometimes the cheap route you suggest is chosen for performance reasons. If a poor filter was used, the damage is already done, so the best thing you can do is avoid making it worse by using poor upscaling.

In the end though, I’m not sure how much the chroma upsampling affects the result. Chroma signals are subsampled because our eyes aren’t as sensitive to colour as luminance after all. VirtualDub’s DV decoder upsamples the chroma signals by treating them as linear and doing linear interpolation. Panasonic’s DV decoder actually only uses nearest neighbour interpolation, but I’d advise against that, since the blockiness is quite visible. Not sure what professional gear or software does.

Hope this helps!

Yes, currently I do nearest neighbour when going from 4:2:0/2 to 4:4:4, then do bicubic+ when doing my final scaling. It seems that by then I already have blockiness - obvious, in hindsight.

So you’re saying that a decent downscaling ‘should’ have happened anyway? That’s good to know - as much as I’d thought about it, I’d thought they did a trivial average of the two (or four) pixels.

I suppose I could be bicubic, but surely all the samples ‘on the other side’ from the pixel I’m recreating should barely count, being 2 pixels away?

Interesting about VirtualDub and the linear interpolation. That’s a simple solution, certainly. If it’s weighted a bit - by the distance from the new pixel center, it’s still an easy calculation. I think I’ll try that to start.

Bruce

Ideally yes, but I don’t know what common practice is. Either way there’s nothing you can do about it, so you may as well pretend it’s decent. :wink:

It’s those extra pixels that make bicubic look better than bilinear, but since these are chroma signals, the difference may be too slight to notice in the final image. I haven’t tried it myself yet.

If you end up happy with linear interpolation, you could just stick the chroma signals in a texture and let the graphics card do the interpolation for you instead of coding it. If you try bicubic chroma interpolation as well, please let me know how it turns out. :slight_smile: