Now I get it : this would avoid the problem of high-order models that have not collected any statistics yet, and are initialized with a probability of 1/2. Thanks for the tip! Sorry if I seem a bit slow of the mark, I'm a bit confused with the terminology sometimes.What I propose (or better: what I have seen in LPAQ IIRC) is to have 5 mixers. Use first one if in step 1 you collected only 1 prediction. Use second on if in step 1 you collected 2 predictions. And so on, finally use fifth one if in step 1 you collected 5 predictions (ie all models had available predictions).
A match model could be a good solution to discard long matches. I have to study this.
Last edited by Jean-Marie Barone; 23rd May 2013 at 11:39. Reason: typo
OK, probably I mixed up something.Both of those techniques are used in various paq8 (but I think not lpaq, or possible in zpaq).
Really? I got an impression that you don't update the contexts nor rebuild them, so in that case that could be true. But for simple order-n models it should be sufficient to track the last max(all models orders) bytes and then rebuild the contexts for all models on mismatch.One problem with disabling models for long matches is that those models will not see most of the match and then predict the next byte badly because it doesn't see the immediate context.
You mean the highest order index or long match length? I think highest order index (if the limit is relatively small, like single digit) should be the best single context for selecting mixer.As for using the context match length as mixer context, it does help, but I'm not sure if it works any better than just using an order 0 or 1 context to select the mixer weight vector, like is normally done in zpaq. It might.
Yep, mainly. Additionally, it could also have other implications - data compression is so unpredictable that you can't reliably predict when something will help or damage compression :]Now I get it : this would avoid the problem of high-order models that have not collected any statistics yet, and are initialized with a probability of 1/2.
More precisely - it would be good for avoiding many reduntant (well, not 100% reduntant, but practically useless) computations.A match model could be a good solution to discard long matches.
In general removing repetitions doesn't change the load on hash table as repetitions do not add any new entries to hash table - the entries are already here. On the other hand, removing repetitions could help avoid problems like overfitting, I think.
> You mean the highest order index or long match length? I think highest order index (if the limit is relatively small, like single digit) should be the best single context for selecting mixer.
Probably so. I think that's why an ICM-ISSE chain works better than mixing different order ICMs. It effectively does the same thing as using the context order as a mixing context because at each step in the chain, an empty bit history selects a pair of weights to modify the prediction as p := w1*p + w2 in the logistic domain. Initially, w1 = 1, w2 = 0. If the proper action is to emphasize the lower order prediction, then the ICM will learn this by increasing w1.
> But for simple order-n models it should be sufficient to track the last max(all models orders) bytes and then rebuild the contexts for all models on mismatch.
That would probably work, but PAQ doesn't do that. There would be a speed penalty, of course.
● A context seen for the first time points to a Bit history value of 0. With version 1.16, the couples of Frequencies linked to this Bit history entry will not be updated, so its probability will remain p(0)=1/2. Therefore, because Stretch(p(0))=Log2(0.5/0.5)=0, contexts seen for the first time will have no impact on the global prediction.
● The Match model is triggered on a byte boundary. It searches, on encoding, the Source buffer history (data read so far) to find the same pattern as the current Context (8 bytes). On decoding, it will search the Destination buffer (data written so far).
The bits following that pattern will be considered as next prediction bits until a mismatch occurs, with probabilities of p(0)=65535/65536 if bit 0 is predicted, and p(0)=1/65536 if bit 1 is predicted.
Eventually, that prediction will be included in the Logistic mixing process, along with Orders-6 to 1 predictions.
The Match model uses a hash-table of 512 Kilobytes (65535 entries of 8 bytes) to find in the buffer a pattern matching the current Context.
Each entry of the hash-table is designed like this :
The 64 bits of the Context are hashed to a 16 bits value. Because collisions are very likely, the low-half of the current Context will be compared with the 2nd double-word of the Match hash-table, which contains the low-half of the last Context having produced the same hash.
If both values differ, a collision is detected and the Match model will terminate with a Match Prediction value of 2, meaning that no prediction could be made, and a Match Offset value of 0, meaning the model is turned "off".
If both values are equal, Match Offset value will be filled with the first double-word of the Match hash-table entry, which contains the offset in the buffer of the next byte that follows the same context. Match Prediction will be filled with the most significant bit of that byte.
In all cases (either a prediction could be carried out or not), the entry of the Match hash-table will be updated with respectively the offset of the byte following the current Context, and its low double-word value.
During the next bit encoding/decoding process, the procedure will know that a Match model is "on" by testing the Match Offset variable which will be different from 0, and will fetch the next bit of Match Prediction directly from that address. If that prediction turns out to be wrong, Match Offset will be set to 0, and a new prediction will be calculated on the next byte boundary.
Results on enwik8 & enwik9 are 21831822 & 183551384.
Last edited by Jean-Marie Barone; 31st July 2013 at 04:20. Reason: typo
Updated LTCB and Silesia.
http://mattmahoney.net/dc/text.html#1835
http://mattmahoney.net/dc/silesia.html
Jean-Marie Barone (1st August 2013)
After looking over Intel & Agner Fog’s documentation regarding Core Duo processors & above, I endeavoured to modify the code in order to give a boost to compression & decompression times.
I have discarded many instructions greater than 1 μop or with high latencies, reorganized them to favor a 4-1-1-1 decoding scheme and, more generally, I tried to take profit from latest Intel procs novelties like Store-forwarding, loopback buffer, zero-latency movs, macro-fusions, etc...
The FNV-1 hash for Contexts has also been overhauled, so that the Order-4 hash uses Order-3 hash as a partial result, and likewise for Order-6 hash.
There is no improvement in the compression engine itself, still the results are a tad better due to a change in the rounding method of the Bit History counters. Also the lookup table for 232/(n0+n1) now uses floating-point values instead of integers previously.
Results on enwik8 & enwik9 are 21816272 & 183459153.
Nice improvements. http://mattmahoney.net/dc/text.html#1834
Edit: also, some improvement in the Silesia benchmark: http://mattmahoney.net/dc/silesia.html
Last edited by Matt Mahoney; 4th November 2013 at 18:22.
Jean-Marie Barone (5th November 2013)
I managed to gain 20 seconds more on enwik8 with my Core 2 Duo Merom, essentially by getting rid of the slow FSCALE instruction in the Squash function, while calculating 2-p.
I also replaced some FPU instructions with faster ones, plus a few more optimization tricks.
Compression results remain identical as previous version.
Nice speed improvement. http://mattmahoney.net/dc/text.html#1834
Jean-Marie Barone (20th November 2013)
* The Squash function has been entirely rebuilt using a chain of polynomial approximations (3rd degree, 32 intervals, step=1).
I've traded a bit of compression for the benefit of speed : -40 seconds on enwik8 on my Core 2 Duo.
* Until recently, I was not aware that XMM registers were considered as volatile when calling an API.
On Windows 64 bits, it was possible (but not confirmed) that calls to ReadFile & WriteFile destroy xmm0 & xmm2, which kept the Context and a mask for a hashtable.
Ergo, I have replaced xmm0 & xmm2 by xmm6 & xmm7, which are preserved on 64 bits.
The Squash function now uses a SSE3 instruction (fisttp). Since SSE3 was introduced in 2004(Intel) & 2005(AMD), I've assumed that nowadays it is active on every PCs (In other words, I've been too lazy to check for SSE3 compatibility).
The (minimax) polynomial tables for the Squash function are the work & idea of qWord : thanks to him, and all MASM32 forum members for their support!
Results on enwik8 & enwik9 are 21816285 & 183459860.
Last edited by Jean-Marie Barone; 9th December 2013 at 06:04. Reason: some bugs with font
Nice speed improvement. http://mattmahoney.net/dc/text.html#1834
Edit: updated http://mattmahoney.net/dc/silesia.html
Last edited by Matt Mahoney; 10th December 2013 at 16:40.
Jean-Marie Barone (10th December 2013)
- I have figured out that my Stretch function could be radically improved.
Indeed, if we substitute p0=(n0/(n0+n1))*232 in Stretch(p0)=log2(p0/(232-p0)), we end up with :
Stretch(p0)=log2(n0/n1)=log2 n0 – log2 n1
with n0=bit 0 count, n1=bit 1 count, p0=prediction that next bit will be 0, and ni ranges between [1;65535]
Thus, there was no need to calculate p0 & p1. Compression of enwik8 is now less than 5 min. under my Core 2 Duo.- The Contexts hashes are now calculated earlier in order to increase the Prefetch Scheduling Distance : prefetches instructions benefit from this additional leeway to complete successfully.
Results on enwik8 & enwik9 are 21816323 & 183459942.
Nania Francesco (19th December 2013)
Updated LTCB and Silesia.
http://mattmahoney.net/dc/text.html#1834
http://mattmahoney.net/dc/silesia.html
Jean-Marie Barone (20th December 2013)
I have switched my prediction formula p0=n0/(n0+n1) with Laplace’s rule of succession, which gives better results in estimating the probability of next symbol.
From now on, p0=(n0+1)/(n0+n1+2),and Stretch(p0)=log2(n0+1)-log2(n1+1).
I made some experiments with Krichevski-Trofimov estimator, where
p0=(n0+0.5)/(n0+n1+1), and Stretch(p0)=log2(2n0+1)-log2(2n1+1), but results were not as good.
One positive side-effect of Laplace estimator is that it solves the Zero-frequency problem : with previous formula, I had to initialize n0 and n1 counts to one to avoid a division by zero. Now, the frequencies can range from [0;65535].
The Match model has been improved. The bit following a matching pattern of 8 bytes is now predicted with a probability of p0=1-(1/x) if bit 0 is predicted, and p0=1/x if bit 1 is predicted, with x=64 and x=x+1 after each correct prediction.
Results on enwik8 & enwik9 are 21781544 & 183190888.
Results for Silesia corpus.
Edit: LTCB results. http://mattmahoney.net/dc/text.html#1831Code:Silesia dicke mozil mr nci ooff osdb reym samba sao webst x-ray xml Compressor -options --------- ----- ----- ---- ----- ---- ----- ---- ----- ---- ----- ----- ---- ------------------- 48093438 2322 16009 2446 1787 2270 2725 1104 3596 4892 6458 4078 400 smac 1.20 48253662 2331 16034 2453 1819 2271 2733 1111 3608 4902 6489 4093 403 smac 1.17 48254125 2331 16035 2453 1819 2271 2733 1111 3608 4902 6489 4093 404 smac 1.19 48254226 2331 16035 2453 1819 2271 2733 1111 3608 4902 6489 4093 404 smac 1.18 48272013 2333 16035 2454 1823 2271 2733 1112 3609 4903 6495 4093 404 smac 1.16
Last edited by Matt Mahoney; 17th January 2014 at 16:41.
Jean-Marie Barone (16th January 2014)
●くろまる SSE improves the prediction by collecting further statistics in a 2D-table, featuring an Order-2 context and a stretched prediction ranging from -16 to +16 (33 boundaries).
My method roughly follows the one described by Matt in the latest PAQ archivers.
Each entry of the table stores a pair of 16 bit counters reflecting the observed bits for a given context & prediction.
From these counters, a new stretched prediction will be calculated.
Like with PAQ, both counters will be halved when one of them is saturated, and initial counters are set so that each prediction maps to itself.
The SSE prediction is calculated as a linear interpolation between above & below boundaries, and the final prediction is equal to 3/4 SSE prediction+1/4 Initial prediction.
The closest entry to the linear interpolation will be updated when the result is known.
●くろまる With hindsight, it appeared to me that my Match model was overly complicated.
Henceforth, each entry of the Match table is a simple doubleword containing the offset of a matching pattern, given current context (8 bytes).
Both contexts (current context & Matching pattern) are still compared to detect hash collisions.
Also, I have increased the memory allocated for the table.
●くろまる FPU instructions have been replaced by SIMD ones (SSE & SSE2), except in the creation of the Logarithms lookup table.
●くろまる Some optimization in the code.
●くろまる Source code has been ported to UASM.
Results on enwik8 & enwik9 are 21047188 & 176607820.
Last edited by Jean-Marie Barone; 3rd June 2018 at 03:06. Reason: forgot results
Mike (3rd June 2018)
This version uses a 3D-table SSE which inputs:
- a stretched prediction from -16 to +16
- an Order-2 context
- the current bit rank (8 segments from 0 to 7)
Some good results can also be achieved using only 2 segments (high & low nibble), or 4 segments.
Bit rank is a useful piece of information; including it in the Modeling improves overall prediction.
Results on enwik8 & enwik9 are 20930754 & 175759034.
Results on Silesia Corpus :
Code:Total dickens mozilla mr nci ooff osdb reym samba sao webst x-ray xml SMAC 1.21 47822648 2319 15829 2445 1787 2261 2725 1102 3543 4892 6436 4078 399 SMAC 1.22 46544819 2271 15421 2401 1593 2233 2608 1060 3479 4849 6242 4000 382 SMAC 1.23 45592332 2269 15002 2285 1589 2147 2467 1057 3485 4789 6234 3883 379
Thanks for sharing this, Jean-Marie! I see it fits good the use case of text-alike data. It even outperforms ppmd on my test set (although at the cost of speed).
Now, could you elaborate a little on memory requirements? I aim to try it on a few big files but I don't want to risk halting the system. Thanks in advance!
It uses a lot of memory, but is still below its 2 GB limit of adress space. Still, I don't recommend using it with files greater than 1 GB as it would be to slow & impractical. You have some better alternatives if you wish to perform some serious backups.
Edit : Also, I forgot to mention SMAC was not designed to compress/decompress files longer than 4 GB, as it uses a 32 bit variable to store the length of the file.
Code:LOCAL FileLength:DWORD call GetFileSize,@@SourceHandle,0 mov FileLength,eax ;store original size
Last edited by Jean-Marie Barone; 13th July 2018 at 22:53.
Thanks for the info. I was just giving it a try to know its limits, no serious backup involved. It seems the method is symmetric, so it has no practical value to me as a day-to-day program, but it's good to see new ideas put to work!