Results 1 to 30 of 37

Thread: TANGELO - new compressor (derived from PAQ8/FP8)

- Show Printable Version
- Email this Page…
- Advanced Search
- Linear Mode
- Switch to Hybrid Mode
- Switch to Threaded Mode

17th June 2013, 21:01 #1
Jan Ondrus

View Profile

View Forum Posts

Private Message
Jan Ondrus is offline
Programmer Jan Ondrus's Avatar

Join Date

Sep 2008

Location

Rychnov nad Kněžnou, Czech Republic

Posts

279

Thanks

33

Thanked 139 Times in 50 Posts
TANGELO - new compressor (derived from PAQ8/FP8)
TANGELO is single file compressor (not archiver) derived from PAQ8/FP8 licensed under GPL.

I removed a lot of stuff from FP8 to make it as simple as possible so it has small source code and it is easier to understand how its core works (i think). Compression engine should be still same as the one in FP8.
Specialized models/transformations for EXE / Images / Audio / JPEG / ... are all removed. You can't pack multiple files with TANGELO. You can't select memory it uses about 550/600mb (same as FP8 with option -7).
It source is about 23kb (compared to 149kb for FP. It should have similar performace as FP8 on text and unknown/default data.

Code:

Usage: TANGELO <command> <infile> <outfile> <Commands> c Compress d Decompress

Attached Files Attached Files

File Type: zip tangelo.zip (51.2 KB, 813 views)
Reply With Quote Reply With Quote
Thanks (9):

Bulat Ziganshin (17th June 2013),encode (18th June 2013),Mat Chartier (17th June 2013),Matt Mahoney (17th June 2013),Mike (19th June 2013),Nania Francesco (25th June 2013),samsat1024 (7th July 2013),Skymmer (20th June 2013),Stephan Busch (18th June 2013)
18th June 2013, 00:54 #2
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Updated Silesia benchmark. http://mattmahoney.net/dc/silesia.html
Compared to fp8_v3 -7, compression is better on structured text (nci and webster) but worse on x86 (ooffice, I guess because no e8e9 filter).

LTCB will have to run overnight.

Edit: LTCB updated. http://mattmahoney.net/dc/text.html#1532
Speed is about the same as fp8_v3 -8 (5.5 hours to compress or decompress enwik9) but compression is a bit worse due to using only half as much memory.

Last edited by Matt Mahoney; 18th June 2013 at 17:28.
Reply With Quote Reply With Quote
Thanks:

Jan Ondrus (18th June 2013)
19th June 2013, 09:24 #3
Alexander Rhatushnyak

View Profile

View Forum Posts

Private Message

Visit Homepage
Alexander Rhatushnyak is offline
Member Alexander Rhatushnyak's Avatar

Join Date

Oct 2007

Location

Ireland

Posts

286

Thanks

95

Thanked 139 Times in 65 Posts
drt|tangelo would probably compress enwik8 and/or enwik9 tighter than drt|lpaq9m, while using almost 3 times less memory.

This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
Reply With Quote Reply With Quote
19th June 2013, 12:19 #4
Mat Chartier

View Profile

View Forum Posts

Private Message
Mat Chartier is offline
Member

Join Date

Jun 2013

Location

Canada

Posts

55

Thanks

35

Thanked 66 Times in 27 Posts
Ran it with DRT on enwik8, enwik9:

enwik8: drt|tangelo 17681785 bytes in 809.17s
enwik9: drt|tangelo 148758265 bytes in 8153.09s

Decompression not verified. Computer: Core i7 2630QM - 8 GB ram

Very nice compression!
Reply With Quote Reply With Quote
19th June 2013, 20:35 #5
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Some results with drt + various compressors (as of June 2010). http://mattmahoney.net/dc/text.html#1440

lpaq9m is tuned for drt output on enwik8/9.
Reply With Quote Reply With Quote
Thanks:

Nania Francesco (25th June 2013)
26th June 2013, 01:41 #6
Alexander Rhatushnyak

View Profile

View Forum Posts

Private Message

Visit Homepage
Alexander Rhatushnyak is offline
Member Alexander Rhatushnyak's Avatar

Join Date

Oct 2007

Location

Ireland

Posts

286

Thanks

95

Thanked 139 Times in 65 Posts
Quote Originally Posted by Matt Mahoney View Post

lpaq9m is tuned for drt output on enwik8/9.

It doesn't look like it's heavily tuned. Slightly more than the following two are:

Compressor ... ratio (dic+drt compressed size divided by enwik8 compressed size)
paq8px_v67 ... 0.9480
paq8l ... 0.9483
...
lpaq9m ... 0.9478

(from the last table in http://mattmahoney.net/dc/text.html#1440 )

Last edited by Alexander Rhatushnyak; 26th June 2013 at 01:44.

This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
Reply With Quote Reply With Quote
6th July 2013, 13:48 #7
Jan Ondrus

View Profile

View Forum Posts

Private Message
Jan Ondrus is offline
Programmer Jan Ondrus's Avatar

Join Date

Sep 2008

Location

Rychnov nad Kněžnou, Czech Republic

Posts

279

Thanks

33

Thanked 139 Times in 50 Posts
version 2.0
TANGELO 2.0
- removed APMs
- removed some modeling (simpler model)
- more simple StateMap and ContextMap
- removed DMC Model
- using less memory and faster
- less compression
- state table from Mat Chartier from this thread http://encode.su/threads/1742-Improv...state-machines

Attached Files Attached Files

File Type: zip tangelo2.zip (41.6 KB, 609 views)
Reply With Quote Reply With Quote
Thanks (3):

Matt Mahoney (7th July 2013),Nania Francesco (6th July 2013),samsat1024 (7th July 2013)
6th July 2013, 22:51 #8
Nania Francesco

View Profile

View Forum Posts

Private Message

Visit Homepage
Nania Francesco is offline
Tester
Nania Francesco's Avatar

Join Date

May 2008

Location

Italy

Posts

1,726

Thanks

266

Thanked 218 Times in 129 Posts
Thanks Jan for the great job you're doing but I think you can enter you as the creator of the program which is very different from Paq8 although similar. I think you should put a LZP to make it as fast as Paq9 (you could decrease the contexts and take only the most significant ones). You need create an archiver (Sami Runsas had put online one free) and how would you improve the solid 10-20%. I would like to work with you, Matt and Mat Chartier for a super archiver!

WCC2013 results are excellent!

Best Regards, Francesco!
Reply With Quote Reply With Quote
6th July 2013, 23:58 #9
Jan Ondrus

View Profile

View Forum Posts

Private Message
Jan Ondrus is offline
Programmer Jan Ondrus's Avatar

Join Date

Sep 2008

Location

Rychnov nad Kněžnou, Czech Republic

Posts

279

Thanks

33

Thanked 139 Times in 50 Posts
Quote Originally Posted by Nania Francesco View Post

Thanks Jan for the great job you're doing but I think you can enter you as the creator of the program which is very different from Paq8 although similar. I think you should put a LZP to make it as fast as Paq9 (you could decrease the contexts and take only the most significant ones). You need create an archiver (Sami Runsas had put online one free) and how would you improve the solid 10-20%. I would like to work with you, Matt and Mat Chartier for a super archiver!

WCC2013 results are excellent!

Best Regards, Francesco!

I don't think i will have time for developing new program. But i have one idea i want to experiment - use static huffman coding before modeling and context mixing to improve speed on redundant data (less bits will be modeled mixed and coded for each byte on average). It would be somewhat similar to how huffman coded data in paq8 JPEG model are handled. Did someone of you guys tried something like that? What do you think?
Reply With Quote Reply With Quote
7th July 2013, 00:25 #10
Nania Francesco

View Profile

View Forum Posts

Private Message

Visit Homepage
Nania Francesco is offline
Tester
Nania Francesco's Avatar

Join Date

May 2008

Location

Italy

Posts

1,726

Thanks

266

Thanked 218 Times in 129 Posts
I think the answer from you alone. It seems to speak of Kung-fu. If you can't beat the enemy, becomes his friend. Simply copy the data with medium probability as in CSC 3.2! With Huffman would you do a big hole in the water!
Reply With Quote Reply With Quote
7th July 2013, 00:32 #11
toffer

View Profile

View Forum Posts

Private Message
toffer is offline
Programmer toffer's Avatar

Join Date

May 2008

Location

Erfurt, Germany

Posts

587

Thanks

0

Thanked 0 Times in 0 Posts
A different alphabet decomposition (mapping of symbols from a N-ary alphabet to a set of prefix codes) certainly is a good idea. I've implemented order-1 Huffman decomposition (256 Huffman trees, one per order-1 context) in my old M1 and M1x2 compressors, see my homepage and check the most recent version. There was a speedup of roughly 30 - 50% and compression remained almost the same. I guess in your case it'll be bigger, since i mixed at most four models. However the way you group symbols of same Huffman code length has some influence on compression. I use a heuristic called "Huffman-III decomposition": http://www.sps.ele.tue.nl/members/f....CTW/Ben99x.pdf.

Hope this helps.
Cheers

M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
Reply With Quote Reply With Quote
7th July 2013, 01:03 #12
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Updated LTCB (so far just enwik and Silesia.
http://mattmahoney.net/dc/text.html#1532
http://mattmahoney.net/dc/silesia.html
Reply With Quote Reply With Quote
7th July 2013, 01:19 #13
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Quote Originally Posted by Jan Ondrus View Post

I don't think i will have time for developing new program. But i have one idea i want to experiment - use static huffman coding before modeling and context mixing to improve speed on redundant data (less bits will be modeled mixed and coded for each byte on average). It would be somewhat similar to how huffman coded data in paq8 JPEG model are handled. Did someone of you guys tried something like that? What do you think?

Depends on what you want to achieve. If you are doing backups, then the most important speed optimizations are detecting already compressed data (to store) and deduplication. This is because on typical disks, most of the data is already compressed and there tends to be a lot of space wasted in extra copies of files. The next best tricks are e8e9 transform (because x86 and x86-64 is a common uncompressed type) and grouping small files that have the same extension to compress together. Most compressible data is binary rather than text, so it is useful to have sparse models and fixed record size models, and you don't need a lot of memory. (An exception is DNA). Most benchmarks are not realistic in this sense, that they tend to have large text files, no duplication, and exclude already compressed files.

Edit: updated enwik9. http://mattmahoney.net/dc/text.html#1532

Last edited by Matt Mahoney; 8th July 2013 at 23:42.
Reply With Quote Reply With Quote
20th July 2013, 20:42 #14
Jan Ondrus

View Profile

View Forum Posts

Private Message
Jan Ondrus is offline
Programmer Jan Ondrus's Avatar

Join Date

Sep 2008

Location

Rychnov nad Kněžnou, Czech Republic

Posts

279

Thanks

33

Thanked 139 Times in 50 Posts
version 2.1
Here is version TANGELO 2.1.
It is faster with weaker compression again.

changes:
- one mixer used for one bit (is selected from 256 possible mixers by previous byte as context)
- removed modeling except match,order0,1,2,3,4,6
- higher order models (2,3,4,6) should be disabled for random-looking (already compressed) data for better speed
- probabilities for states are now fixed (StateMap class replaced by array of probabilities)

Attached Files Attached Files

File Type: zip tangelo2_1.zip (34.5 KB, 562 views)
Reply With Quote Reply With Quote
Thanks:

Mat Chartier (22nd July 2013)
21st July 2013, 05:57 #15
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Updated LTCB and Silesia benchmark.
http://mattmahoney.net/dc/text.html#1532
http://mattmahoney.net/dc/silesia.html
Reply With Quote Reply With Quote
21st July 2013, 19:56 #16
mahessel

View Profile

View Forum Posts

Private Message
mahessel is offline
Member mahessel's Avatar

Join Date

Apr 2010

Location

Netherlands

Posts

9

Thanks

0

Thanked 2 Times in 2 Posts
tiny change for x64 platforms
Made a tiny change, now it works on x64 platforms (and it is able to reserve more than 2Gb memory), without harming the initial setup.

Attached Files Attached Files

File Type: zip tangelo2_2.zip (130.8 KB, 365 views)
Reply With Quote Reply With Quote
Thanks:

Jan Ondrus (21st July 2013)
22nd July 2013, 22:48 #17
Jan Ondrus

View Profile

View Forum Posts

Private Message
Jan Ondrus is offline
Programmer Jan Ondrus's Avatar

Join Date

Sep 2008

Location

Rychnov nad Kněžnou, Czech Republic

Posts

279

Thanks

33

Thanked 139 Times in 50 Posts
version 2.3
TANGELO 2.3

- (re)added simple APM for better compression
- some small changes for better speed

Attached Files Attached Files

File Type: zip tangelo2_3.zip (34.4 KB, 805 views)
Reply With Quote Reply With Quote
24th July 2013, 17:35 #18
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Updated LTCB and Silesia benchmarks.
http://mattmahoney.net/dc/text.html#1532
http://mattmahoney.net/dc/silesia.html
Reply With Quote Reply With Quote
25th July 2013, 02:24 #19
Stephan Busch

View Profile

View Forum Posts

Private Message

Visit Homepage
Stephan Busch is offline
Tester
Stephan Busch's Avatar

Join Date

May 2008

Location

Bremen, Germany

Posts

879

Thanks

476

Thanked 176 Times in 86 Posts
this tangelo 2.3 compile keeps crashing on my system - which version of libstdc++-6.dll is needed?

ah.. fixed..
I had version from 21.09.2011
it works with version from 16.10.2012

Last edited by Stephan Busch; 25th July 2013 at 02:32.
Reply With Quote Reply With Quote
25th July 2013, 03:53 #20
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
I forgot to mention that tangelo.exe did not run because it was looking for some cygwin DLL files. I recompiled it from source for the test. The problem could be fixed by compiling with -static.
Reply With Quote Reply With Quote
27th July 2013, 23:41 #21
mahessel

View Profile

View Forum Posts

Private Message
mahessel is offline
Member mahessel's Avatar

Join Date

Apr 2010

Location

Netherlands

Posts

9

Thanks

0

Thanked 2 Times in 2 Posts
@Jan Ondrus, are you sure the construction around 'bytes_read' and 'bytes_written' is working properly? With 'enwik8' variable 'rn' does not change ...
Reply With Quote Reply With Quote
28th July 2013, 01:00 #22
Mat Chartier

View Profile

View Forum Posts

Private Message
Mat Chartier is offline
Member

Join Date

Jun 2013

Location

Canada

Posts

55

Thanks

35

Thanked 66 Times in 27 Posts
@mahessel, rn should only become one if tangelo detects that the input is compressing poorly. It is expected that rn should never change on highly compressible files such as text/xml.
Reply With Quote Reply With Quote
28th July 2013, 11:35 #23
Jan Ondrus

View Profile

View Forum Posts

Private Message
Jan Ondrus is offline
Programmer Jan Ondrus's Avatar

Join Date

Sep 2008

Location

Rychnov nad Kněžnou, Czech Republic

Posts

279

Thanks

33

Thanked 139 Times in 50 Posts
Quote Originally Posted by Mat Chartier View Post

@mahessel, rn should only become one if tangelo detects that the input is compressing poorly. It is expected that rn should never change on highly compressible files such as text/xml.

Yes, exactly.
Reply With Quote Reply With Quote
28th July 2013, 19:12 #24
mahessel

View Profile

View Forum Posts

Private Message
mahessel is offline
Member mahessel's Avatar

Join Date

Apr 2010

Location

Netherlands

Posts

9

Thanks

0

Thanked 2 Times in 2 Posts
Okay, clear.

Changing the squash table into '0,2,6,11,20,33,52,82,126,193,290,430,626,888,1222 ,1616,2048,2479,2873,3207,3469,3665,3805,3902,3969 ,4013,4043,4062,4075,4084,4089,4093,4095' will improve the compression with about 30KB on enwik9 :):)
Reply With Quote Reply With Quote
28th July 2013, 21:59 #25
Jan Ondrus

View Profile

View Forum Posts

Private Message
Jan Ondrus is offline
Programmer Jan Ondrus's Avatar

Join Date

Sep 2008

Location

Rychnov nad Kněžnou, Czech Republic

Posts

279

Thanks

33

Thanked 139 Times in 50 Posts
Quote Originally Posted by mahessel View Post

Okay, clear.

Changing the squash table into '0,2,6,11,20,33,52,82,126,193,290,430,626,888,1222 ,1616,2048,2479,2873,3207,3469,3665,3805,3902,3969 ,4013,4043,4062,4075,4084,4089,4093,4095' will improve the compression with about 30KB on enwik9 :):)

Do you know why?
Reply With Quote Reply With Quote
30th July 2013, 20:16 #26
mahessel

View Profile

View Forum Posts

Private Message
mahessel is offline
Member mahessel's Avatar

Join Date

Apr 2010

Location

Netherlands

Posts

9

Thanks

0

Thanked 2 Times in 2 Posts
First, the curve should start with 0 and end with 4095 (just like the limiter in squash).
Second, the curve flow is a choice. In this case a less steeper slope preforms better.
For example, you can tweak the curve by using:

double t[4096];
for (int n = 0; n < 4096; ++n) {
t[n] = 1.0 / (1.0 + exp ((2048.0 - n) / TWEAKME));
}
const double offset = t[0];
const double scale = 4095.0 / (t[4095] - offset);
for (int n = 0; n < 4096; ++n) {
t[n] = (t[n] - offset) * scale;
table[n] = round (t[n]);
}

And change TWEAKME into 300 as value, the 'default' squash curve is about 150.
Reply With Quote Reply With Quote
17th August 2013, 22:30 #27
Jan Ondrus

View Profile

View Forum Posts

Private Message
Jan Ondrus is offline
Programmer Jan Ondrus's Avatar

Join Date

Sep 2008

Location

Rychnov nad Kněžnou, Czech Republic

Posts

279

Thanks

33

Thanked 139 Times in 50 Posts
version 2.4
TANGELO 2.4

- added fast JPEG model based on model from paq8fthis_fast.cpp (http://cs.fit.edu/~mmahoney/compression/paq8fthis4.zip)
- this version is without APM

Attached Files Attached Files

File Type: zip tangelo2_4.zip (47.2 KB, 602 views)
Reply With Quote Reply With Quote
Thanks:

Nania Francesco (18th August 2013)
19th August 2013, 05:23 #28
joerg

View Profile

View Forum Posts

Private Message
joerg is offline
Member

Join Date

May 2008

Location

Germany

Posts

445

Thanks

59

Thanked 72 Times in 44 Posts
vista sp2 32bit:

- missing files: libstdc++-6.dll , libgcc_s_dw2-1.dll
- after downloading these 2 dll and
copying the files to the tangelo.exe it works ..

first look: it seems to be
a little bit slowly on my core2duo
but has a good compression
better than zpaq in my test - needs more testing ...

best regards

Attached Files Attached Files

File Type: zip dll.zip (294.3 KB, 415 views)
Last edited by joerg; 19th August 2013 at 06:05.
Reply With Quote Reply With Quote
Thanks:

Samuraikarte (20th June 2014)

21st August 2013, 05:08 #29

Matt Mahoney

Matt Mahoney is offline

Expert

Matt Mahoney's Avatar

Join Date: May 2008
Location: Melbourne, Florida, USA
Posts: 3,271
Thanks: 315; Thanked 841 Times in 506 Posts

tangelo results on Silesia

Code:

 Silesia dicke mozil mr nci ooff osdb reym samba sao webst x-ray xml Compressor -options
--------- ----- ----- ---- ----- ---- ----- ---- ----- ---- ----- ----- ---- -------------------
 37809279 2078 11228 2133 1029 1826 2168 867 2889 4283 5376 3636 291 tangelo 1.0
 41267068 2246 12479 2229 1320 2051 2330 978 3116 4478 5999 3716 321 tangelo 2.0
 44037765 2279 13895 2227 1580 2301 2449 1038 3298 4524 6306 3778 358 tangelo 2.3
 44847833 2299 14109 2283 1635 2328 2574 1050 3337 4653 6356 3846 371 tangelo 2.1
 44862127 2299 14121 2284 1631 2328 2575 1050 3343 4654 6356 3846 371 tangelo 2.4

Reply With Quote Reply With Quote

9th September 2013, 04:08 #30
Mangix

View Profile

View Forum Posts

Private Message
Mangix is offline
Member

Join Date

Jun 2013

Location

USA

Posts

98

Thanks

4

Thanked 14 Times in 12 Posts
I did a half-baked attempt at getting TANGELO to support standard input and output. Seems to work so far. Currently this is version 2.4.

https://github.com/neheb/TANGELO

edit: I should also mention that the recommended compiler switches are probably suboptimal. -O3 -msse2 seems to work best for me.
Reply With Quote Reply With Quote

« Previous Thread | Next Thread »

Similar Threads

FP8 (= Fast PAQ8)

By Jan Ondrus in forum Data Compression

Replies: 77
Last Post: 18th November 2025, 23:09
How to use fp8???

By nelsontky in forum Data Compression

Replies: 9
Last Post: 11th February 2016, 16:39
deflate model for paq8?

By kaitz in forum Data Compression

Replies: 2
Last Post: 6th February 2009, 21:48
PAQ8 tests

By kaitz in forum Forum Archive

Replies: 4
Last Post: 17th January 2008, 15:03
PeaZip v1.3 now with PAQ8 support!

By LovePimple in forum Forum Archive

Replies: 29
Last Post: 9th February 2007, 16:58

Posting Permissions

You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
[VIDEO] code is On
HTML code is Off

Forum Rules

All times are GMT +3. The time now is 23:23.

Thread: TANGELO - new compressor (derived from PAQ8/FP8)

TANGELO - new compressor (derived from PAQ8/FP8)

Thanks (9):

Thanks:

Thanks:

version 2.0

Thanks (3):

version 2.1

Thanks:

tiny change for x64 platforms

Thanks:

version 2.3

version 2.4

Thanks:

Thanks:

Similar Threads

FP8 (= Fast PAQ8)

How to use fp8???

deflate model for paq8?

PAQ8 tests

PeaZip v1.3 now with PAQ8 support!

Posting Permissions