Results 1 to 12 of 12

Thread: Optimized LZSS compressor

- Show Printable Version
- Email this Page…
- Advanced Search
- Linear Mode
- Switch to Hybrid Mode
- Switch to Threaded Mode

7th February 2014, 00:29 #1
encode

View Profile

View Forum Posts

Private Message

Visit Homepage
encode is offline
The Founder encode's Avatar

Join Date

May 2006

Location

Moscow, Russia

Posts

4,147

Thanks

617

Thanked 558 Times in 213 Posts
Cool Optimized LZSS compressor
Okay, it's here! It's a byte-aligned LZSS with a 16KB window, optimized for speed (cf command) or maximum compression (with Optimal Parsing a la LZMA/CABARC) (cx command). And this is heavy for such a midget! :) Later I will describe the compression format and other algorithm details. And for now:

LTCB, real world performance on Samsung 840 Pro SSD (512 GB)

Code:

C:\lzss\Release>lzss cf enwik9 enwik9.lzss Compressing enwik9: 1000000000->448712956 in 6.019s C:\lzss\Release>lzss c enwik9 enwik9.lzss Compressing enwik9: 1000000000->399850630 in 12.533s C:\lzss\Release>lzss cx enwik9 enwik9.lzss Compressing enwik9: 1000000000->380192378 in 107.047s C:\lzss\Release>lzss d enwik9.lzss e9 Decompressing enwik9.lzss: 380192378->1000000000 in 2.293s C:\lzss\Release>lzss cx enwik8 enwik8.lzss Compressing enwik8: 100000000->42874387 in 9.313s
Reply With Quote Reply With Quote
7th February 2014, 17:12 #2
Sportman

View Profile

View Forum Posts

Private Message

Visit Homepage
Sportman is offline
Member

Join Date

Aug 2008

Location

Planet Earth

Posts

1,496

Thanks

219

Thanked 770 Times in 513 Posts
967,021,395 bytes, html text:

246,945,757 bytes, 3.959 sec. - 1.679 sec., lzss cf
222,299,342 bytes, 7.243 sec. - 1.583 sec., lzss c
213,732,668 bytes, 511.993 sec. - 1.555 sec., lzss cx

1,000,000,000 bytes, enwik9:

448,712,956 bytes, 6.161 sec. - 2.537 sec., lzss cf
399,850,630 bytes, 12.978 sec. - 2.446 sec., lzss c
380,192,378 bytes, 111.910 sec. - 2.325 sec., lzss cx

All RAM-disk
Reply With Quote Reply With Quote
Thanks:

encode (8th February 2014)
8th February 2014, 07:54 #3
encode

View Profile

View Forum Posts

Private Message

Visit Homepage
encode is offline
The Founder encode's Avatar

Join Date

May 2006

Location

Moscow, Russia

Posts

4,147

Thanks

617

Thanked 558 Times in 213 Posts
...slower than my SSD results On my machine the RAM-disk results are slightly faster than SSD, so I decided to post slower "real world" results - it's more interesting :) It's cool that fast SSDs are nearly as fast as RAM-disk these days! I must test BBB in slow mode on my machine...
Reply With Quote Reply With Quote
10th February 2014, 19:57 #4
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
LTCB results. http://mattmahoney.net/dc/text.html#3375
Reply With Quote Reply With Quote
Thanks:

encode (10th February 2014)
10th February 2014, 20:47 #5
encode

View Profile

View Forum Posts

Private Message

Visit Homepage
encode is offline
The Founder encode's Avatar

Join Date

May 2006

Location

Moscow, Russia

Posts

4,147

Thanks

617

Thanked 558 Times in 213 Posts
Thanks a lot! Please note, c mode must use the same amount of memory as f. And please add my timings to LZSS' table as well! ;)
Reply With Quote Reply With Quote

10th February 2014, 22:29 #6

Matt Mahoney

Matt Mahoney is offline

Expert

Matt Mahoney's Avatar

Join Date: May 2008
Location: Melbourne, Florida, USA
Posts: 3,271
Thanks: 315; Thanked 841 Times in 506 Posts

What test hardware did you use?

I used timer32 to measure memory usage (working set). Maybe I should use commit.

Code:

C:\res>timer32 lzss cf enwik8 x
Compressing enwik8: 100000000->50110565 in 6.124s
Commit = 150172 KB = 147 MB
Work Set = 18596 KB = 19 MB
Kernel Time = 0.436 = 6%
User Time = 2.262 = 35%
Process Time = 2.698 = 42%
Global Time = 6.298 = 100%
C:\res>timer32 lzss c enwik8 x
Compressing enwik8: 100000000->45093733 in 4.646s
Commit = 150168 KB = 147 MB
Work Set = 95156 KB = 93 MB
Kernel Time = 0.452 = 9%
User Time = 4.134 = 88%
Process Time = 4.586 = 98%
Global Time = 4.674 = 100%
C:\res>timer32 lzss cx enwik8 x
Compressing enwik8: 100000000->42874387 in 23.911s
Commit = 150172 KB = 147 MB
Work Set = 149952 KB = 147 MB
Kernel Time = 0.405 = 1%
User Time = 23.337 = 97%
Process Time = 23.743 = 99%
Global Time = 23.954 = 100%

Reply With Quote Reply With Quote

11th February 2014, 04:34 #7
encode

View Profile

View Forum Posts

Private Message

Visit Homepage
encode is offline
The Founder encode's Avatar

Join Date

May 2006

Location

Moscow, Russia

Posts

4,147

Thanks

617

Thanked 558 Times in 213 Posts
It's Intel i7-3770K @ 4.7GHz, Corsair Vengeance LP 1600MHz CL9 16GB RAM, Samsung 840 Pro 512 GB SSD, Microsoft Windows 7 SP1
A faster memory makes no difference on this machine. 2133MHz and up makes a tiny difference though, at the cost of notable higher CPU temps...

As to LZSS memory usage - please check the Windows Task Manager. I'm not sure where your timer gets these numbers. The executable image may reserve such amount, but never use. "Work Set" looks like random to me...
And the actual algorithm mem.use:
cf,c - 1+16MB
cx - 1+16+(16*8=12MB
d - 16MB
Reply With Quote Reply With Quote
11th February 2014, 18:48 #8
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
I noted that LZSS 0.01 (which compressed smaller) was withdrawn and I removed it from the main table, but I figured you wouldn't mind since it gives you a spot on the Pareto frontier for decompression speed. :D http://mattmahoney.net/dc/text.html#3802
Reply With Quote Reply With Quote
11th February 2014, 21:16 #9
encode

View Profile

View Forum Posts

Private Message

Visit Homepage
encode is offline
The Founder encode's Avatar

Join Date

May 2006

Location

Moscow, Russia

Posts

4,147

Thanks

617

Thanked 558 Times in 213 Posts
Thank you Matt!

And I think it's time to describe LZSS' machinery.

LZSS compresses input by 16MB blocks. The compressed stream consists of (series of):

Uncompressed block size (4-bytes / signed 32-bit integer, Little-endian) - <=16MB
Compressed data

You may concatenate compressed files - e.g.:

Code:

lzss c book1 book1.lzss lzss c book2 book2.lzss copy /b book1.lzss+book2.lzss books.lzss

The compressed stream is pretty much similar to a classic LZSS:

Eight Literal / Match flags are packet into one byte.
1 = Match, 0 = Literal
Bit-order differs from classic LZSS. Here I'm going from highest bit to lowest. In this case we may avoid shift operations:

Code:

if (t&128) // Match else // Literal t+=t; // t<<=1;

Literals stored as-is.

Matches are coded as:

Code:

OOOOOOLL OOOOOOOO - for Match Lengths of 3..5 - i.e. 0..2+MIN_MATCH OOOOOO11 OOOOOOOO LLLLLLLL - for Match Lengths of 6 and up to MAX_MATCH=3+255+MIN_MATCH

i.e. offsets are stored as 14-bit integers. (16KB window)
Matches longer than 5-bytes require an extra byte.

Decompressor is the same for all compression modes.

Compression modes are:

cf - Fast mode - Fast string searching, Greedy parsing
c - Normal mode - Balanced string searching, Greedy parsing
cx - Maximum (Extreme) mode - Exhaustive string searching, Optimal parsing (improved Storer&Szymanski parsing, a la CABARC/LZMA-Ultra but on minimalistic LZSS output)

Hope this helps!
Reply With Quote Reply With Quote
Thanks (2):

Matt Mahoney (12th February 2014),nauful (13th February 2014)
13th February 2014, 16:53 #10
nauful

View Profile

View Forum Posts

Private Message
nauful is offline
Member

Join Date

Nov 2013

Location

US

Posts

172

Thanks

39

Thanked 56 Times in 32 Posts
Out of curiosity, how well would compression ratio compare to using order-1 ROLZ to increase the window size and reduce the number of bits for match position? Even a 12-bit order-1 ROLZ table (4k entries) would mean not needing another byte until matches are at 8 + MATCH_MIN.
Reply With Quote Reply With Quote
13th February 2014, 22:05 #11
encode

View Profile

View Forum Posts

Private Message

Visit Homepage
encode is offline
The Founder encode's Avatar

Join Date

May 2006

Location

Moscow, Russia

Posts

4,147

Thanks

617

Thanked 558 Times in 213 Posts
http://encode.su/threads/550-Ultra-f...ll=1#post32574

Might be it's not the best example to show up right now... Anyway, I've compared LZ77 to various ROLZ variants many times, not sure I've posted any reports on ROLZ. However, ROLZ is not that good in case we keep literals unencoded. ROLZ as well as LZP needs some literal coder - since we simply loose some short matches that LZ77 can compress. And that's it! :)
Reply With Quote Reply With Quote
13th February 2014, 23:51 #12
nauful

View Profile

View Forum Posts

Private Message
nauful is offline
Member

Join Date

Nov 2013

Location

US

Posts

172

Thanks

39

Thanked 56 Times in 32 Posts
Well... I have no estimates to compare, it might be interesting to pack into symbols and use Huffman or some other fast entropy coder. Although, as byte-aligned, I guess this isn't your goal right now :)
Reply With Quote Reply With Quote

« Previous Thread | Next Thread »

Similar Threads

LZF - Optimized LZF compressor

By encode in forum Data Compression

Replies: 39
Last Post: 28th March 2019, 20:49
question to a LZSS-like compressor

By seppi0815 in forum Data Compression

Replies: 17
Last Post: 13th January 2017, 13:21
In-memory benchmark with fastest LZSS (QuickLZ, Snappy) compressors

By inikep in forum Data Compression

Replies: 109
Last Post: 29th August 2016, 21:40
LZSS v0.01 is here!

By encode in forum Data Compression

Replies: 67
Last Post: 28th March 2012, 11:10
LZSS with a large dictionary

By encode in forum Data Compression

Replies: 31
Last Post: 31st July 2008, 22:15

Tags for this Thread

lz, lz77, lzss

View Tag Cloud

Posting Permissions

You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
[VIDEO] code is On
HTML code is Off

Forum Rules

All times are GMT +3. The time now is 09:18.

Thread: Optimized LZSS compressor

Cool Optimized LZSS compressor

Thanks:

Thanks:

Thanks (2):

Similar Threads

LZF - Optimized LZF compressor

question to a LZSS-like compressor

In-memory benchmark with fastest LZSS (QuickLZ, Snappy) compressors

LZSS v0.01 is here!

LZSS with a large dictionary

Tags for this Thread

Posting Permissions