Late night (or already morning) here and I guess time to release something. Well, working under PPM is timeless process. Since the last release, PPMX was many times rewritten from scratch, I tested many ideas, applied many improvements and optimizations. Found that PPMX 0.05 is totally wrong, and I must write a proper enough PPM-based compressor, doesn't matter will it be greatest or not. So, here are the PPMX 0.06. Quite a baseline order-4 PPM, no SEE, no special tricks or gimmicks. Anyway, with that PPM engine I started something essentially new. At least, I did something new in PPM area - I just can't see that PPMd is everywhere and only. If directly compare to PPMd (-o4) PPMX has really different properties:
Having said that PPMX is much more complex project than, say, BCM or even PIMPLE. Despite PPMX.CPP is as small as 8 KB, it is very tricky to remove all redundant computations and make it faster. I spent on PPMX far much more time than on noted BCM. It's both more interesting, more complex and really non trivial task. So, anyway, enjoy new release!
- It uses hash tables, so no need in rebuild/flush tree
- It is much more stable on "bad" files for PPMd - audio, already compressed and some binary files
- PPMX has no SEE
- It has more aggressive model update (more non-stationary model)
- The model set is order-4-2-1-0-(-1), i.e. PPMX has no order-3 model
- Memory usage is fixed and is about 70 MB
EDIT: Uploaded PPMX 0.06P4 - A highly optimized build that utilizes SSE2 vectorization. As a note you need a Pentium 4 or newer processor.
Thanks, now,test it.
Thanks Ilia!
Mirror: http://lovepimple.110mb.com/
small tests:
# / original size / ppmx 0.06 (freearc) / 7z-ppmd:mem70m_o32
1 / 815438 / 620934 / 600147
2 / 45596394 / 20618832 / 18380193
3 / 53134726 / 13077216 / 12071403
4 / 51853567 / 24721968 / 23749705
5 / 10379612 / 4583081 / 4471278
6 / 590413759 / 259526096 / 241291465
If you compare an order-4 PPM to an order-32 PPMII, don't forget to consider the compression time! Or else you may add a PAQ8 results... ;)
Thanks a lot!
D:\renkoudmp1306円>time 0<cr
当前时间: 16:04:17.57
输入新时间:
D:\renkoudmp1306円>ppmx c hu1306.dmp hu1306.dmp.px
PPMX 0.06P4 Fast PPM compressor
Copyright (C) 2010 Ilia Muraviev
Compressing...
100%
393103360 -> 26968237 in 23.8 sec
D:\renkoudmp1306円>time 0<cr
当前时间: 16:04:41.64
while
D:\renkoudmp1306円>time 0<cr
当前时间: 15:51:27.97
输入新时间:
D:\renkoudmp1306円>pigz -6 hu1306.dmp
D:\renkoudmp1306円>time 0<cr
当前时间: 15:51:30.00
输入新时间:
Did a complete rewrite of PPMX again. Currently new PPMX completely compatible with PPMX 0.06 (same compression) but performance became notable better:
Core2Quad, ENWIK8:
PPMX 0.06 (IA32) -> 21 sec
PPMX 0.06P4 (SSE2) -> 18 sec
PPMX 0.07 (IA32) -> 16 sec
Although, current PPMX 0.07 (IA32) still can be slightly slower on binaries than vectorized 0.06P4 (SSE2). Continue working...
Recency scaling... Nice idea that actually works, even with text files.
Each time during encoding, we temporarily increase the recent char count (seen in this context) by some amount or factor, say
count[r0]+=count[r0]>>2;
and after encoding, we restore the original count:
count[r0]=original_count;
Just collecting and testing some ideas for PPMd-like counters...
Interesting idea. Bit histories in paq and zpaq (ICM and ISSE) effectively do this by making the last bit part of the state in addition to the two bit counts.
Unfortunately, looks like we have much less options with byte counts. At the same time PPM improvements not ends on SEE. It's good to use SSE as well. As example, if compare simple SSE on last symbol seen (rank 0) versus SEE, simple SSE can give a higher compression gain with the same complexity and speed impact. Probably, in the past, too many authors are turned on PPM's SEE or didn't knew much about SSE. Anyway, currently with PPMX I'm building a very good basement. Any tricks can be added in the future, but currently I want to keep it fast. Still searching for ideas on how to make a simple count-based counter more advanced.
count[c]=table[count[c]][tot]; // a state machine like stuff probably.
Just currently, the only idea that really works is that recency/deterministic scaling.
Also, Dmitry Shakrin make use of some trick in SEE of contexts with unmasked symbols, some ad-hoc (or not) and tricky escape count update:
// Tabulated escapes for exponential symbol distribution
static const BYTE ExpEscape[16]={51,43,18,12,11,9,8,7,6,5,4,3,3,2,2,2};
// ...
pc->SummFreq=p->Freq+(ns > 1)+ExpEscape[QTable[BSumm >> 8]];
> It's good to use SSE as well.
> As example, if compare simple SSE on last symbol seen (rank 0) versus SEE,
rank0 is not necessarily last symbol... eg. it can be also a symbol with
highest freq.
> simple SSE can give a higher compression gain with the same
> complexity and speed impact.
But escape modelling in PPM is still important.
I mean, I don't think that with plain escape modelling, but
added SSE for rank0 symbol, your results would be better than
with SEE instead.
> Probably, in the past, too many authors are turned on PPM's SEE or
> didn't knew much about SSE.
I think only a few people ever tried to implement a PPM at all.
And as to SSE, there're too many open questions about it, PPM or not.
> Still searching for ideas on how to make a simple count-based
> counter more advanced.
Its the same thing as with SSE actually - basically you can't.
(Its not completely impossible, just requires too many multiplications
and divisions, also too much memory for precise counter values
(like floats) and it would be still easy to lose precision somewhere).
Well, plain counts make it easy for SSE - they can be adaptively reordered, and masked
(at the cost of division per counter to compute the probability, if there's SSE),
but independent counters are better, and there're too much troubles with both
speed and precision if you'd try to reorder them.
So I think that there's no choice - we just have to use a fixed decomposition
(ideally something like a huffman tree built from order1 stats).
At least, recently I found a way to use it with a tree instead of hashtables
(i mean http://encode.su/threads/1173-New-da...for-bitwise-CM).
Also, as it seems, it also solves the masking problem, as escapes can be
coded at the leafs (not really escapes, but selection of unused branches),
and we can continue with that subtree at the lower order, so no masking
would be necessary.
> count[c]=table[count[c]][tot]; // a state machine like stuff probably.
Sure, with that you can get better results than with increments, but
1. Even a 16-32k table can be surprisingly slow, as L1 data cache
size is still only 32k on core2. i7 moved previous L2 to L3 and
added a 256k L2, but that's still not much.
2. I'm not aware of any theory which could help you to compute such a table.
I guess its possible to do what I did with that fsm counter here - initialize
the table with a simple increment and run optimizer on it.
But it would be slow with such a large table and overfitting is very likely...
3. Even if you'd manage to find a way to update a single freq like that,
that would leave all other freqs unchanged (ie probabilities of all other symbols
would be changed uniformly).
I mean, ideally you could also modify other counts, but that only increases
the trouble.
But as to PPM contexts and state machines, I think it could be interesting
to merge all the counts into one state.
I mean, the context state (symbol freqs in ppmd, or as a single value) is
a lossy representation of context history.
So it should be possible to explicitly build a lossy compression scheme
for rank history (keeping symbol codes aside for the tree), fit the
history code into eg. 24 bits or less, and look up the next state as usual.
Should be very compact and still possibly better than freqtables
(as it won't discard the order of symbols in history).
> Just currently, the only idea that really works is that recency/deterministic scaling.
Actually its a very simple case of delayed update.
To improve it, you can store one (you already have it with mtf ranking) or more
recent symbols in contexts, and do a context-aware probability estimation and update
with these.
As a side effect, it might also work as a speed optimization, because you won't
have to update new contexts.
> static const BYTE ExpEscape[16]={51,43,18,12,11,9,8,7,6,5,4,3,3,2,2,2};
> pc->SummFreq=p->Freq+(ns > 1)+ExpEscape[QTable[BSumm >> 8]];
BSumm is a SEE probability of no-escape (ie symbol coding in binary context)
from the just processed binary context (its stored in processBinSymbol()).
So lower esc freq for lower SEE escape probability.
The tricky part is, if look carefully on that temporal count increment, it's the same as permanent last count decrement. In other words, recency scaling is kind of continuous rescaling. And here we rescale or decrement not all symbols in a context, but just last seen.
It's close to PAQ1-like counters, then we decrease the opposite bit count. Here we just decrementing the previous char seen in the context! Nice!
Some testing results for RKUC v1.04 to see what PPM with sparse (binary in terms of Malcolm Taylor) contexts is:
No code has to be inserted here.
It's a big question on how to add sparse contexts to PPM. Unlike CM, with PPM we code a symbol using just one context each time. According to results, most likely RKUC has no SSE, which potentially can use sparse contexts. Also, we can see, something's going on with order-1 and especially order-0 - an extra contexts used there - so order-0 is not real order-0 with "-b" parameter. Probably, Malcolm Taylor added/replaced an order-1 with some small sparse contexts... Or even mixed sparse contexts with order-0/1... Will test some assumptions with PPMX...
Afaik RKive, PPMZ2 and Co. use some special Local Order Estimation techniques.
Based on some cost-function or heuristic you skip some contexts instead of escaping from them.
Most obvious way is to sort contexts by most probable symbol.
As you can see, the LOE is disabled by default.Code:Standalone RKIVE v2.00 universal compressor (v1.04). Copyright (c) 1999 Malcolm Taylor. RKUC c|e [args] source [destination] Args: -mXXX Model size in megabytes. -oXX Model order (0-16). -x Use LOE. -b Use binary contexts.
Some testing results with PPMX.
ppmx006
ppmx006s1 - with sparse context (x.x)
ppmx006s2 - with sparse context (x..x)
ppmx006s3 - with sparse context (xx..)
All versions are exactly the same, except that order-1 model replaced with various sparse models.
test.exe
ppmx006 -> 894828
ppmx006s1 -> 884834
ppmx006s2 -> 891880
ppmx006s3 -> 884347 [!]
calgary.tar
ppmx006 -> 816877
ppmx006s1 -> 815323
ppmx006s2 -> 813619
ppmx006s3 -> 815907
reaktor.exe
ppmx006 -> 2474332
ppmx006s1 -> 2451736
ppmx006s2 -> 2470299
ppmx006s3 -> 2434944
photoshop.exe
ppmx006 -> 7496977
ppmx006s1 ->7479555
ppmx006s2 -> 7439805
ppmx006s3 -> 7374416 [!]
acrord32.exe
ppmx006 -> 1620570
ppmx006s1 -> 1588094
ppmx006s2 -> 1588853
ppmx006s3 -> 1573459 [!]
mso97.dll
ppmx006 -> 1990065
ppmx006s1 -> 1950584
ppmx006s2 -> 1962239
ppmx006s3 -> 1963458
pak0.pak
ppmx006 -> 94131081
ppmx006s1 -> 92351099
ppmx006s2 -> 92821704
ppmx006s3 -> 92533142
TraktorDJStudio3.exe and sparse contexts:
No code has to be inserted here.
Did some experiments with an order-5 model, really dislike the speed impact:
ENWIK8:
order-4 -> 25,970,xxx in 16.9 sec
order-5 -> 24,173,xxx in 24.3 sec
ppmx006 -> 26,131,726 in 20.9 sec
ppmx005 -> 22,905,422 in 66.4 sec [!]
Huge speed impact. The results for calgary.tar are:
order-4 -> 816,xxx
order-5 -> 794,xxx
world95.txt:
order-4 -> 618,xxx
order-5 -> 548,xxx
3200.txt:
order-4 -> 4,179,xxx
order-5 -> 3,949,xxx
fp.log:
order-4 -> 765,xxx
order-5 -> 695,xxx
i.e. it's a new compression level...
Continue experimenting with model orders and data structures.
world95.exe and different model sets:
ppmx v0.05 -> 478,220 bytes (68 sec and 22,905,422 bytes on enwik
ppmx v0.06 -> 618,903 bytes (21 sec and 26,131,726 bytes on enwik
order-4-2-1-0 -> 618,866 bytes (17 sec and 25,980,665 bytes on enwik
order-5-3-2-1-0 -> 547,381 bytes (24 sec and 23,980,471 bytes on enwik
order-6-4-2-1-0 -> 515,661 bytes (32 sec and 23,186,823 bytes on enwik
order-7-4-2-1-0 -> 500,102 bytes (37 sec and 22,995,582 bytes on enwik
order-8-4-2-1-0 -> 494,055 bytes (42 sec and 23,121,832 bytes on enwik
order-7-5-3-2-1-0 -> 500,368 bytes (41 sec and 22,616,507 bytes on enwik
order-8-5-3-2-1-0 -> 491,018 bytes (46 sec and 22,432,311 bytes on enwik
order-9-5-3-2-1-0 -> 486,643 bytes (50 sec and 22,436,610 bytes on enwik
Which version you would prefer?
lzma result for enwik8 is 24557177 - for PPM it doesn't make sense to do worse
Stuck in the middle with a new PPMX. Did an extreme count of experiments. Mainly for byte-oriented counter improvement. Can't find something really simple that will work. Still choosing between order-5 and order-6, and thinking about fast and simple (or not) SEE...
> Can't find something really simple that will work
That's a really faulty assumption - there's no simple function
to magically turn an average codec into good one.
All known good codecs are fairly complicated, with hardcoded
handlers for many special cases etc. In fact, most of them
have multiple layers of compression (ppmd=symbol ranking+CM,
lzma=LZ77+CM,bsc=BWT/ST+qlfc+CM etc). Only ccm is basically
single-stage (there're filters though), so its possible to
improve its speed by adding huffman decomposition etc - but
still it has special tweaks for a few types of data in the main loop.
And then, we actually don't need such a simple function either, even
if it existed - because it would be easy to reverse-engineer and
understand.
Maybe that's why we need it. Because it's good for education. Please note that encode released some code free and even in cases when he didn't he shared quite a few pieces of knowledge.Quote Originally Posted by Shelwien View PostAnd then, we actually don't need such a simple function either, even
if it existed - because it would be easy to reverse-engineer and
understand.
Not everybody fears reversing.
> Maybe that's why we need it. Because it's good for education.
I meant that there's no benefit for original author, not even fame.
Example: do you know who invented hashtable?
http://en.wikipedia.org/wiki/Hash_table#History
Also that complexity is what makes data compression so interesting -
its possible to incrementally add parts to a codec and precisely measure
how it affects the results. Somewhat like a game.
Now its normal for a complex solution to be better than simple one
(eg. ppmd vs ppmx), but appearance of such a "magic function" would
mean "game over".
> Please note that encode released some code free and even in cases
> when he didn't he shared quite a few pieces of knowledge.
Still its a fact that he tries to make some products to sell,
and I see nothing wrong with that.
> Not everybody fears reversing.
That only applies when we know that we didn't make anything exceptional.
And if you accidentally did, you should be always aware of cases like this:
http://groups.google.com/group/comp....ffc48cc17cfb0f
So having a complex app which is not based on a single idea would make your
life much easier.
Fame, money, these are just 2 of many things that make people going.Quote Originally Posted by Shelwien View Post> Maybe that's why we need it. Because it's good for education.
I meant that there's no benefit for original author, not even fame.
Example: do you know who invented hashtable?
http://en.wikipedia.org/wiki/Hash_table#History
Both are important, but not the only ones. And sharing knowledge does not go against either of them.
Mhm. You see it as engineering as opposed to a spark of genius. I concur.Quote Originally Posted by Shelwien View PostAlso that complexity is what makes data compression so interesting -
its possible to incrementally add parts to a codec and precisely measure
how it affects the results. Somewhat like a game.
Now its normal for a complex solution to be better than simple one
(eg. ppmd vs ppmx), but appearance of such a "magic function" would
mean "game over".
Me neither.Quote Originally Posted by Shelwien View PostStill its a fact that he tries to make some products to sell,
and I see nothing wrong with that.
Actually the argument goes both ways. If you state something publicly, you can defend the authorship easier, especially if it catches.Quote Originally Posted by Shelwien View Post> Not everybody fears reversing.
That only applies when we know that we didn't make anything exceptional.
And if you accidentally did, you should be always aware of cases like this:
http://groups.google.com/group/comp....ffc48cc17cfb0f
So having a complex app which is not based on a single idea would make your
life much easier.