12/27/2014
12-27-14 - Lagrange Rate Control Part 4
01-12-10 - Lagrange Rate Control Part 1
01-12-10 - Lagrange Rate Control Part 2
01-13-10 - Lagrange Rate Control Part 3
I was thinking about it recently, and I wanted to do some vague rambling about the overall issue.
First of all, the lagrange method in general for image & video coding is just totally wrong. The main problem is that it assumes every coding decision is independent. That distortions are isolated and additive, which they aren't.
The core of the lagrange method is that you set a "bit usefulness" value (lambda) and then you make each independent coding decision based on whether more bits improve D by lambda or more.
But that's just wrong, because distortions are *not* localized and independent. I've mentioned a few times recently the issue of quality variation; if you make one block in image blurry, and leave others at high detail, that looks far worse than the localized D value tells you, because it's different and stands out. If you have a big patch of similar blocks, then making them different in any way is very noticeable and ugly. There are simple non-local effects, like if the current block is part of a smooth/gradient area, then blending smoothly with neighbors in the output is crucial, and the localized D won't tell you that. There are difficult non-local effects, like if the exact same kind of texture occurs in multiple parts of the image, then coding them differently makes the viewer go "WTF", it's a quality penalty worse than the local D would tell you.
In video, the non-local D effects are even more extreme due to temporal coherence. Any change of quality over time that's due to the coder (and not due to motion) is very ugly (like I frames coming in with too few bits and then being corrected over time, or even worse the horrible MPEG pop if the I-frame doesn't match the cut). Flickering of blocks if they change coding quality over time is horrific. etc. etc. None of this is measurable in a localized lagrangian decision.
(I'm even ignoring for the moment the fact that the encoding itself is non-local; eg. coding of the current block affects the coding of future blocks, either due to context modeling or value prediction or whatever; I'm just talking about the fact that D is highly non-local).
The correct thing to do is to have a total-image (or total-video) perceptual quality metric, and make each coding decision based on how it affects the total quality. But this is impossible.
Okay.
So the funny thing is that the lagrange method actually gets you some global perceptual quality by accident.
Assume we are using quite a simple local D metric like SSD or SAD possibly with SATD or something. Just in images, perceptually what you want is for smooth blocks to be preserved quite well, and very noisey/random blocks to have more error. Constant quantizer doesn't do that, but constant lambda does! Because the random-ish blocks are much harder to code, they cost more bits per quality, they will be coded at lower quality.
In video, it's even more extreme and kind of magical. Blocks with a lot of temporal change are not as important visually - it's okay to have high error where there's major motion, and they are harder to code so they get worse quality. Blocks that stay still are important to have high quality, but they are also easier to code so that happens automatically.
That's just within a frame, but frame-to-frame, which is what I was talking about as "lagrange rate control" the same magic sort of comes out. Frames with lots of detail and motion are harder to code, so get lower quality. Chunks of the video that are still are easier to code, so get higher quality. The high-motion frames will still get more bits than the low-motion frames, just not as many more bits as they would at constant-quality.
It can sort of all seem well justified.
But it's not. The funny thing is that we're optimizing a non-perceptual local D. This D is not taking into account things like the fact that high motion block errors are less noticeable. It's just a hack that by optimizing for a non-perceptual D we wind up with a pretty good perceptual optimization.
Lagrange rate control is sort of neat because it gets you started with pretty good bit allocation without any obvious heuristic tweakage. But that goes away pretty fast. You find that using L1 vs. L2 norm for D makes a big difference in perceptual quality; maybe L1 squared? other powers of D change bit allocation a lot. And then you want to do something like MB-tree to push bits backward; for example the I frame at a cut should get a bigger chunk of bits so that quality pops in rather than trickles in, etc.
I was thinking of this because I mentioned to ryg the other day that I never got B frames working well in my video coder. They worked, and they helped in terms of naive distortion measures, but they created an ugly perceptual quality problem - they had a slightly different look and quality than the P frames, so in a PBPBP sequence you would see a pulsing of quality that was really awful.
The problem is they didn't have uniform perceptual quality. There were a few nasty issues.
One is that at low bit rates, the "B skip" block becomes very desirable in B frames. (for me "B skip" = send no movec or residual; use predicted movec to future and past frames to make an interpolated output block). The "B skip" is very cheap to send, and has pretty decent quality. As you lower bit rate, suddenly the B frames start picking "B skip" all over, and they actually have lower quality than the P frames. This is an example of a problem I mentioned in the PVQ posts - if you don't have a very smooth contimuum of R/D choices, then an RD optimizing coder will get stuck in some holes and there will be sudden pops of quality that are very ugly.
At higher bit rates, the B frames are easier to code to high quality, (among other things, the P frame is using mocomp from further in the past), so the pulsing of quality is high quality B's and lower quality P's.
It's just an issue that lagrange rate control can't handle. You either need a very good real perceptual quality metric to do B-P rate control, or you just need well tweaked heuristics, which seems to be what most people do.
12/17/2014
12-17-14 - PVQ Vector Distribution Note
You have a bunch of values, you send the length separately so you're left with a unit vector. Independent values are not equally distributed on the unit sphere, so you don't want a normal unit vector quantizer, you want this one that is scaled squares.
Okay, that's all fine, but the application to images is problematic.
In images, we have these various AC residuals. Let's assume 8x8 blocks for now for clarity. In each block, you have ACij (ij in 0-7 and AC00 excluded). The ij are frequency coordinates for each coefficient. You also have spatial neighbors in the adjacent blocks.
The problem is that the AC's are not independent - their not independent either in frequency or spatial coordinates. AC42 is strongly correlated to AC21 and also to AC42 in the neighboring blocks. They also don't have the same distribution; lower freqency-index AC's have much higher means.
In order to use Pyramid VQ, we need to find a grouping of AC's into a vector, such that the values we are putting in that vector are as uncorrelated and equally distributed as possible.
One paper I found that I linked in the previous email forms a vector by taking all the coefficients in the same frequency slot in a spatial region (for each ij, take all ACij in the 4x4 neighorhood of blocks). This is appealing in the sense that it gathers AC's of the same frequency subband, so they have roughly the same distribution. The problem is there are strong spatial correlations.
In Daala they form "subbands" of the coefficients by grouping together chunks of ACs that are in similar frequency groups.
The reason why correlation is bad is that it makes the PVQ codebook not optimal. For correlated values you should have more codebook vectors with neighboring values similar. eg. more entries around {0, 2, 2, 0} and fewer around {2, 0, 0, 2}. The PVQ codebok assumes those are equally likely.
You can however, make up for this a bit with the way you encode the codebook index. It doesn't fix the quantizer, but it does extract the correlation.
In classical PVQ (P = Pyramid) you would simply form an index to the vector and send it with an equiprobable code. But in practice you might do it with binary subdivision or incremental enumeration schemes, and then you need not make all codebook vectors equiprobable.
For example in Daala one of the issues for the lower subbands is that the vectors that have signal in the low AC's are more probable than the high AC's. eg. for subband that spans AC10 - AC40 , {1,0,0,0} is much more likely than {0,0,0,1}.
Of course this becomes a big mess when you consider Predictive VQ, because the Householder transform scrambles everywhere up in a way that makes it hard to model these built-in skews. On the other hand, if the Predictive VQ removes enough of the correlation with neighbors and subband, then you are left with a roughly evenly distributed vector again which is what you want.
12/16/2014
12-16-14 - Daala PVQ Emails
Also, the Main Daala page has more links, including the "Intro to Video" series, which is rather more than an intro and is a good read. It's a broad survey of modern video coding.
Now, a big raw dump of emails between me, ryg, and JM Valin. I'm gonna try to color them to make it a bit easier to follow. Thusly :
And this all starts with me being not very clear on PVQ so the beginning is a little fuzzy.
I will be following this up with a "summary of PVQ as I now understand it" which is probably more useful for most people. So, read that, not this.
(also jebus the internet is rindoculuos. Can I have BBS's back? Like literally 1200 baud text is better than the fucking nightmare that the internet has become. And I wouldn't mind playing a little Trade Wars once a day...)
12/08/2014
12-08-14 - BPG
It's pretty dang slow. Really slow. It's all covered by many patents. So I'm not sure how realistic it is as a useable format. Nonetheless, it's very useful as something to compare against.
I ran with default options (YCbCr in 420). I compared against JPEG_pdec, which as I previously noted JPEG pdec is very comparable to DLI . (JPEG_pdec = JPEG+packjpg+ my jpegdec (decblocker, etc)).
Conclusion :
BPG is really good. The best I've seen. It kills JPEG-pdec in RMSE, in fact I think it's the best RMSE performance I've seen despite being at a disadvantage (YCbCr and 420). Under the perceptual metrics (MS-SSIM-Y and "Combo") it doesn't win so strongly. That tells me there is probably room for better perceptual tuning of bit allocation and quantizers. But it's definitely strong.
Quick visual evaluation by me :
Porsche640 : BPG wins pretty hard here. Perhaps the most noticeable thing is much better detail preservation in the texture regions (the gravel and bushes). It also does a better job on the edges of the car, it doesn't smear them into nasty DCT block artifacts. You may download Porsche640 comparison images here (1 MB, RAR)
Moses : actually not a very big win here. It does much better at preserving the smooth gradient background (my current JPEGdec doesn't have any special modes for big smooth areas). Visually the main thing you'll notice is that the smooth gradients are nasty chunky steps with JPEG and are nice and smooth with BPG. Other than that, I actually think JPEG is better on Moses himself. Both make a big perceptual rate allocation mistake and put too many bits on the jacket texture and not enough on the human skin texture. But JPEG preserves more of the face texture; when you A/B compare it's clear that BPG is way over-smoothing his face. Particularly on the forehead and the neck fat. But all over really. Both BPG and JPEG make a classic mistake on Moses : they kill too much of the red and blue detail in the tie because it's in chroma.
The raw reports :
porsche640 :
imdiff RMSE_RGB
Built Aug 30 2014 11:30:27
r:\porsche640.bmp
r:\porsche640.bmp_bpg
r:\porsche640.bmp_jpg_pdec
raw imdiff data : -2.31,-1.29,-0.42,-0.13,0.02,0.31,0.69,0.92,1.56,1.93|18.55,13.46,9.59,8.46,7.87,6.79,5.45,4.72,3.14,2.56|-2.26,-1.96,-1.44,-1.13,-0.89,-0.71,-0.44,-0.24,-0.04,0.08,0.21,0.35,0.54,0.76,1.06,1.55,2.54|22.22,19.98,16.74,14.86,13.60,12.73,11.49,10.62,9.82,9.35,8.81,8.30,7.63,6.87,5.91,4.57,3.07| fit imdiff data : -2.31,-1.29,-0.42,-0.13,0.02,0.31,0.69,0.92,1.56,1.93|4.64,5.17,5.69,5.87,5.96,6.16,6.43,6.60,7.02,7.20|-2.26,-1.96,-1.44,-1.13,-0.89,-0.71,-0.44,-0.24,-0.04,0.08,0.21,0.35,0.54,0.76,1.06,1.55,2.54|4.33,4.52,4.82,5.01,5.16,5.26,5.42,5.54,5.65,5.72,5.81,5.89,6.01,6.14,6.33,6.63,7.04|
Built Aug 30 2014 11:30:27
r:\porsche640.bmp
r:\porsche640.bmp_bpg
r:\porsche640.bmp_jpg_pdec
raw imdiff data : -2.31,-1.29,-0.42,-0.13,0.02,0.31,0.69,0.92,1.56,1.93|83.03,89.78,93.93,95.07,95.57,96.43,97.39,97.88,98.87,99.25|-2.26,-1.96,-1.44,-1.13,-0.89,-0.71,-0.44,-0.24,-0.04,0.08,0.21,0.35,0.54,0.76,1.06,1.55,2.54|79.88,82.47,86.76,89.12,90.67,91.72,93.14,94.07,94.84,95.30,95.74,96.18,96.72,97.27,97.95,98.70,99.40| fit imdiff data : -2.31,-1.29,-0.42,-0.13,0.02,0.31,0.69,0.92,1.56,1.93|4.23,5.06,5.73,5.96,6.07,6.29,6.57,6.73,7.18,7.41|-2.26,-1.96,-1.44,-1.13,-0.89,-0.71,-0.44,-0.24,-0.04,0.08,0.21,0.35,0.54,0.76,1.06,1.55,2.54|3.90,4.17,4.66,4.97,5.19,5.35,5.59,5.76,5.91,6.01,6.11,6.22,6.36,6.53,6.76,7.08,7.53|
Built Aug 30 2014 11:30:27
r:\porsche640.bmp
r:\porsche640.bmp_bpg
r:\porsche640.bmp_jpg_pdec
raw imdiff data : -2.31,-1.29,-0.42,-0.13,0.02,0.31,0.69,0.92,1.56,1.93|4.83,4.05,3.41,3.21,3.10,2.90,2.63,2.46,2.03,1.80|-2.26,-1.96,-1.44,-1.13,-0.89,-0.71,-0.44,-0.24,-0.04,0.08,0.21,0.35,0.54,0.76,1.06,1.55,2.54|5.30,4.97,4.44,4.11,3.88,3.72,3.48,3.31,3.16,3.06,2.96,2.86,2.72,2.56,2.35,2.05,1.74| fit imdiff data : -2.31,-1.29,-0.42,-0.13,0.02,0.31,0.69,0.92,1.56,1.93|4.19,4.99,5.65,5.86,5.98,6.18,6.45,6.62,7.05,7.28|-2.26,-1.96,-1.44,-1.13,-0.89,-0.71,-0.44,-0.24,-0.04,0.08,0.21,0.35,0.54,0.76,1.06,1.55,2.54|3.69,4.04,4.59,4.93,5.17,5.34,5.58,5.75,5.91,6.01,6.11,6.22,6.36,6.52,6.74,7.03,7.34|
imdiff RMSE_RGB
Built Aug 30 2014 11:30:27
r:\PDI_1200.bmp
r:\PDI_1200.bmp_bpg
r:\PDI_1200.bmp_jpg_pdec
raw imdiff data : -2.50,-1.64,-0.85,-0.58,-0.43,-0.16,0.22,0.46,1.11,1.51|16.50,12.13,8.70,7.79,7.27,6.38,5.31,4.79,3.76,3.43|-1.77,-1.47,-1.24,-1.06,-0.79,-0.58,-0.38,-0.26,-0.12,0.02,0.21,0.78|17.05,15.36,14.18,13.28,12.07,11.20,10.38,9.91,9.36,8.80,8.09,6.38| fit imdiff data : -2.50,-1.64,-0.85,-0.58,-0.43,-0.16,0.22,0.46,1.11,1.51|4.84,5.34,5.83,5.98,6.07,6.24,6.46,6.58,6.84,6.93|-1.77,-1.47,-1.24,-1.06,-0.79,-0.58,-0.38,-0.26,-0.12,0.02,0.21,0.78|4.79,4.96,5.09,5.19,5.34,5.46,5.57,5.64,5.72,5.81,5.93,6.24|
Built Aug 30 2014 11:30:27
r:\PDI_1200.bmp
r:\PDI_1200.bmp_bpg
r:\PDI_1200.bmp_jpg_pdec
raw imdiff data : -2.50,-1.64,-0.85,-0.58,-0.43,-0.16,0.22,0.46,1.11,1.51|87.81,92.21,95.12,95.98,96.34,96.99,97.77,98.16,98.98,99.30|-1.77,-1.47,-1.24,-1.06,-0.79,-0.58,-0.38,-0.26,-0.12,0.02,0.21,0.78|89.84,91.44,92.46,93.22,94.23,94.94,95.53,95.87,96.23,96.60,97.04,98.08| fit imdiff data : -2.50,-1.64,-0.85,-0.58,-0.43,-0.16,0.22,0.46,1.11,1.51|4.80,5.43,5.97,6.17,6.26,6.44,6.69,6.84,7.24,7.45|-1.77,-1.47,-1.24,-1.06,-0.79,-0.58,-0.38,-0.26,-0.12,0.02,0.21,0.78|5.07,5.31,5.47,5.60,5.79,5.93,6.06,6.14,6.23,6.33,6.45,6.81|
Built Aug 30 2014 11:30:27
r:\PDI_1200.bmp
r:\PDI_1200.bmp_bpg
r:\PDI_1200.bmp_jpg_pdec
raw imdiff data : -2.50,-1.64,-0.85,-0.58,-0.43,-0.16,0.22,0.46,1.11,1.51|4.51,3.84,3.27,3.08,2.98,2.80,2.55,2.40,2.02,1.81|-1.77,-1.47,-1.24,-1.06,-0.79,-0.58,-0.38,-0.26,-0.12,0.02,0.21,0.78|4.25,3.96,3.76,3.60,3.39,3.23,3.09,3.00,2.91,2.80,2.67,2.32| fit imdiff data : -2.50,-1.64,-0.85,-0.58,-0.43,-0.16,0.22,0.46,1.11,1.51|4.52,5.21,5.80,5.99,6.09,6.28,6.53,6.68,7.07,7.27|-1.77,-1.47,-1.24,-1.06,-0.79,-0.58,-0.38,-0.26,-0.12,0.02,0.21,0.78|4.79,5.08,5.29,5.46,5.67,5.84,5.98,6.07,6.17,6.28,6.41,6.76|
moses :
imdiff RMSE_RGB
Built Aug 30 2014 11:30:27
r:\moses.bmp
r:\moses.bmp_bpg
r:\moses.bmp_jpg_pdec
raw imdiff data : -2.89,-1.97,-1.18,-0.90,-0.76,-0.48,-0.08,0.17,0.92,1.36|13.96,9.96,7.15,6.37,5.99,5.30,4.38,3.86,2.65,2.18|-2.10,-1.78,-1.56,-1.38,-1.12,-0.91,-0.72,-0.60,-0.46,-0.32,-0.11,0.49|13.20,11.59,10.38,9.65,8.60,7.86,7.18,6.83,6.44,6.06,5.58,4.40| fit imdiff data : -2.89,-1.97,-1.18,-0.90,-0.76,-0.48,-0.08,0.17,0.92,1.36|5.11,5.63,6.09,6.24,6.32,6.46,6.68,6.81,7.18,7.34|-2.10,-1.78,-1.56,-1.38,-1.12,-0.91,-0.72,-0.60,-0.46,-0.32,-0.11,0.49|5.20,5.41,5.57,5.68,5.84,5.97,6.09,6.15,6.23,6.30,6.40,6.68|
Built Aug 30 2014 11:30:27
r:\moses.bmp
r:\moses.bmp_bpg
r:\moses.bmp_jpg_pdec
raw imdiff data : -2.89,-1.97,-1.18,-0.90,-0.76,-0.48,-0.08,0.17,0.92,1.36|84.91,90.55,94.24,95.26,95.72,96.50,97.41,97.88,98.88,99.25|-2.10,-1.78,-1.56,-1.38,-1.12,-0.91,-0.72,-0.60,-0.46,-0.32,-0.11,0.49|87.54,89.76,91.23,92.22,93.54,94.44,95.20,95.62,96.02,96.43,96.93,98.06| fit imdiff data : -2.89,-1.97,-1.18,-0.90,-0.76,-0.48,-0.08,0.17,0.92,1.36|4.45,5.17,5.79,6.00,6.11,6.30,6.57,6.73,7.18,7.42|-2.10,-1.78,-1.56,-1.38,-1.12,-0.91,-0.72,-0.60,-0.46,-0.32,-0.11,0.49|4.76,5.06,5.27,5.43,5.66,5.83,5.99,6.08,6.18,6.29,6.42,6.80|
Built Aug 30 2014 11:30:27
r:\moses.bmp
r:\moses.bmp_bpg
r:\moses.bmp_jpg_pdec
raw imdiff data : -2.89,-1.97,-1.18,-0.90,-0.76,-0.48,-0.08,0.17,0.92,1.36|4.43,3.78,3.22,3.04,2.95,2.79,2.56,2.44,2.03,1.81|-2.10,-1.78,-1.56,-1.38,-1.12,-0.91,-0.72,-0.60,-0.46,-0.32,-0.11,0.49|4.19,3.89,3.65,3.51,3.29,3.13,2.98,2.90,2.81,2.72,2.59,2.26| fit imdiff data : -2.89,-1.97,-1.18,-0.90,-0.76,-0.48,-0.08,0.17,0.92,1.36|4.60,5.28,5.84,6.03,6.12,6.29,6.52,6.65,7.05,7.28|-2.10,-1.78,-1.56,-1.38,-1.12,-0.91,-0.72,-0.60,-0.46,-0.32,-0.11,0.49|4.85,5.16,5.41,5.56,5.78,5.94,6.09,6.18,6.27,6.36,6.49,6.83|
old rants
-
2018
(34)
- October (8)
- August (1)
- July (1)
- June (3)
- May (7)
- April (3)
- March (5)
- February (5)
- January (1)
-
2015
(33)
- December (1)
- November (1)
- October (1)
- September (1)
- July (1)
- June (1)
- May (8)
- March (6)
- February (11)
- January (2)
-
2014
(41)
- December (8)
- November (2)
- September (1)
- August (2)
- July (2)
- June (4)
- March (6)
- February (14)
- January (2)
-
2013
(41)
- December (1)
- November (2)
- October (2)
- September (2)
- August (5)
- June (1)
- May (3)
- April (7)
- March (11)
- February (6)
- January (1)
-
2012
(78)
- December (8)
- November (5)
- October (11)
- September (21)
- August (5)
- July (7)
- June (7)
- May (3)
- April (2)
- March (8)
- January (1)
-
2011
(134)
- December (7)
- November (9)
- October (13)
- September (16)
- August (8)
- July (31)
- June (13)
- May (4)
- April (6)
- March (7)
- February (8)
- January (12)
-
2010
(154)
- December (7)
- November (14)
- October (27)
- September (20)
- August (26)
- July (14)
- June (15)
- May (16)
- April (2)
- March (4)
- February (2)
- January (7)