Okay, it's here! It's a byte-aligned LZSS with a 16KB window, optimized for speed (cf command) or maximum compression (with Optimal Parsing a la LZMA/CABARC) (cx command). And this is heavy for such a midget! :) Later I will describe the compression format and other algorithm details. And for now:
LTCB, real world performance on Samsung 840 Pro SSD (512 GB)
Code:C:\lzss\Release>lzss cf enwik9 enwik9.lzss Compressing enwik9: 1000000000->448712956 in 6.019s C:\lzss\Release>lzss c enwik9 enwik9.lzss Compressing enwik9: 1000000000->399850630 in 12.533s C:\lzss\Release>lzss cx enwik9 enwik9.lzss Compressing enwik9: 1000000000->380192378 in 107.047s C:\lzss\Release>lzss d enwik9.lzss e9 Decompressing enwik9.lzss: 380192378->1000000000 in 2.293s C:\lzss\Release>lzss cx enwik8 enwik8.lzss Compressing enwik8: 100000000->42874387 in 9.313s
967,021,395 bytes, html text:
246,945,757 bytes, 3.959 sec. - 1.679 sec., lzss cf
222,299,342 bytes, 7.243 sec. - 1.583 sec., lzss c
213,732,668 bytes, 511.993 sec. - 1.555 sec., lzss cx
1,000,000,000 bytes, enwik9:
448,712,956 bytes, 6.161 sec. - 2.537 sec., lzss cf
399,850,630 bytes, 12.978 sec. - 2.446 sec., lzss c
380,192,378 bytes, 111.910 sec. - 2.325 sec., lzss cx
All RAM-disk
encode (8th February 2014)
...slower than my SSD results On my machine the RAM-disk results are slightly faster than SSD, so I decided to post slower "real world" results - it's more interesting :) It's cool that fast SSDs are nearly as fast as RAM-disk these days! I must test BBB in slow mode on my machine...
LTCB results. http://mattmahoney.net/dc/text.html#3375
encode (10th February 2014)
Thanks a lot! Please note, c mode must use the same amount of memory as f. And please add my timings to LZSS' table as well! ;)
What test hardware did you use?
I used timer32 to measure memory usage (working set). Maybe I should use commit.
Code:C:\res>timer32 lzss cf enwik8 x Compressing enwik8: 100000000->50110565 in 6.124s Commit = 150172 KB = 147 MB Work Set = 18596 KB = 19 MB Kernel Time = 0.436 = 6% User Time = 2.262 = 35% Process Time = 2.698 = 42% Global Time = 6.298 = 100% C:\res>timer32 lzss c enwik8 x Compressing enwik8: 100000000->45093733 in 4.646s Commit = 150168 KB = 147 MB Work Set = 95156 KB = 93 MB Kernel Time = 0.452 = 9% User Time = 4.134 = 88% Process Time = 4.586 = 98% Global Time = 4.674 = 100% C:\res>timer32 lzss cx enwik8 x Compressing enwik8: 100000000->42874387 in 23.911s Commit = 150172 KB = 147 MB Work Set = 149952 KB = 147 MB Kernel Time = 0.405 = 1% User Time = 23.337 = 97% Process Time = 23.743 = 99% Global Time = 23.954 = 100%
It's Intel i7-3770K @ 4.7GHz, Corsair Vengeance LP 1600MHz CL9 16GB RAM, Samsung 840 Pro 512 GB SSD, Microsoft Windows 7 SP1
A faster memory makes no difference on this machine. 2133MHz and up makes a tiny difference though, at the cost of notable higher CPU temps...
As to LZSS memory usage - please check the Windows Task Manager. I'm not sure where your timer gets these numbers. The executable image may reserve such amount, but never use. "Work Set" looks like random to me...
And the actual algorithm mem.use:
cf,c - 1+16MB
cx - 1+16+(16*8=12MB
d - 16MB
I noted that LZSS 0.01 (which compressed smaller) was withdrawn and I removed it from the main table, but I figured you wouldn't mind since it gives you a spot on the Pareto frontier for decompression speed. :D http://mattmahoney.net/dc/text.html#3802
Thank you Matt!
And I think it's time to describe LZSS' machinery.
LZSS compresses input by 16MB blocks. The compressed stream consists of (series of):
- Uncompressed block size (4-bytes / signed 32-bit integer, Little-endian) - <=16MB
- Compressed data
You may concatenate compressed files - e.g.:
The compressed stream is pretty much similar to a classic LZSS:Code:lzss c book1 book1.lzss lzss c book2 book2.lzss copy /b book1.lzss+book2.lzss books.lzss
Eight Literal / Match flags are packet into one byte.
1 = Match, 0 = Literal
Bit-order differs from classic LZSS. Here I'm going from highest bit to lowest. In this case we may avoid shift operations:
Literals stored as-is.Code:if (t&128) // Match else // Literal t+=t; // t<<=1;
Matches are coded as:
i.e. offsets are stored as 14-bit integers. (16KB window)Code:OOOOOOLL OOOOOOOO - for Match Lengths of 3..5 - i.e. 0..2+MIN_MATCH OOOOOO11 OOOOOOOO LLLLLLLL - for Match Lengths of 6 and up to MAX_MATCH=3+255+MIN_MATCH
Matches longer than 5-bytes require an extra byte.
Decompressor is the same for all compression modes.
Compression modes are:
- cf - Fast mode - Fast string searching, Greedy parsing
- c - Normal mode - Balanced string searching, Greedy parsing
- cx - Maximum (Extreme) mode - Exhaustive string searching, Optimal parsing (improved Storer&Szymanski parsing, a la CABARC/LZMA-Ultra but on minimalistic LZSS output)
Hope this helps!
Matt Mahoney (12th February 2014),nauful (13th February 2014)
Out of curiosity, how well would compression ratio compare to using order-1 ROLZ to increase the window size and reduce the number of bits for match position? Even a 12-bit order-1 ROLZ table (4k entries) would mean not needing another byte until matches are at 8 + MATCH_MIN.
http://encode.su/threads/550-Ultra-f...ll=1#post32574
Might be it's not the best example to show up right now... Anyway, I've compared LZ77 to various ROLZ variants many times, not sure I've posted any reports on ROLZ. However, ROLZ is not that good in case we keep literals unencoded. ROLZ as well as LZP needs some literal coder - since we simply loose some short matches that LZ77 can compress. And that's it! :)
Well... I have no estimates to compare, it might be interesting to pack into symbols and use Huffman or some other fast entropy coder. Although, as byte-aligned, I guess this isn't your goal right now :)