Results 1 to 20 of 20

Thread: CSC3.2 is on developing

- Show Printable Version
- Email this Page…
- Advanced Search
- Linear Mode
- Switch to Hybrid Mode
- Switch to Threaded Mode

9th May 2010, 08:07 #1
Fu Siyuan

View Profile

View Forum Posts

Private Message

Visit Homepage
Fu Siyuan is offline
Member Fu Siyuan's Avatar

Join Date

Apr 2009

Location

Mountain View, CA, US

Posts

176

Thanks

11

Thanked 17 Times in 2 Posts
CSC3.2
Hello Every, Here is my CSC 3.2 Alpha 2.
Here is csc on compression ratings
It's a complete rewritten of CSC3.1, aiming at better ration and faster speed.

The main algorithm of them are almost same -- LZ77 + Ari.
But to achieve a better performance, I made such changes:

1. Abandon HashChain match finder. Use hashTable instead. And the number of match candidates is only 1-6 (hidden parameter -mx is 12). That means in -m1 there is only 1 hash-6-bytes candiate on each string matching.

2. On most data, LZ is very good. But on bad data like audios. It is very slow. So only find matches on the beginning of small blocks and insert hash-6 strings every several bytes. Then it would performs like skip (30MB/s+) but can find duplicate blocks. (try Tar contains several same zipped files)

3. Detection: CSC3.1 calculates each column's(12円3円4円8円16円) entropy to determine if delta is needed every 64KB. It's slow. Now such code instead:
while(progress<size-12)
{

sameDist[0]+=abs((signed)src[progress]-(signed)src[progress+1]);
sameDist[1]+=abs((signed)src[progress]-(signed)src[progress+2]);
sameDist[2]+=abs((signed)src[progress]-(signed)src[progress+3]);
sameDist[3]+=abs((signed)src[progress]-(signed)src[progress+4]);
sameDist[4]+=abs((signed)src[progress]-(signed)src[progress+8]);
sameDist[5]+=abs((signed)src[progress]-(signed)src[progress+12]);

progress++;
}

The MinBlockSize is 8KB. If largest sameDist[i] and smallest sameDist[i] differs much that often indicates this block is tables and smallest sameDist[i] is the Channel Num.

4. Improved parsing and model. There are 5 kinds of packes in LZ out stream:
Normal Match, Matches with Repeat Distance (4), 1-Byte match (last distance)(helps much in data tables), Same match(last len and distance), literal.
Literal use order-1 (3MSBit as context). Match distances coding use match len as context.

5. Range coder changed to the same as LZMA. Seems 10-20% faster on decompression compared to CSC31's coder.

6. The dictionary size is enlarged much ---320MB (actually 500MB+ but seems meaningless). Due to the changes in match finder, memory usage is only 40M+dictionary size in -m1 mode. Decompression needs 5M+dictionary size, i/o buffer included.

Still need to improve:
LZ Parsing has bugs so far. Its price evaluation doesn't work. Check e.coli -- Theorical should has no matches and be compressed to 2bpb.

The delta coding is far from statisfied. Thanks to Nania, but his works distinguish datas by header of file. So what sould I do if I don't know the width of bitmap? Sami advices linear preditor + order 0 is good enough. But I lack the knowledge of linear preditor. Also change CSC to archiver is a kind of solution.

Technical changes in CSC3.2 Alpha 3:
1. Removed repeat match in LZ. Very little ratio improved on most files.
2. Changed literal coding a little lead to very little ratio improved on all files.
3. Changed delta coding. Improved decoding speed.
4. Cached some bits in matchfinder, improved compression speed.

Technical changes in CSC3.2 Alpha 4:
1. Improved compression speed on data that hard to compress
2. Fixed a bug that hurts compression.
3. Reset mode: -m0 -> -m1, -m1 -> -m2, -m2/3 -> -m3. default -m2 now.
4. Dictionary size is more flexible now, can be any KB ranges from 32KB to 512MB.

Technical changes in CSC3.2 Alpha 5:
1. Improved compression speed a little.
2. A very weak and experimental DICT (not bulat's :) ) preprocessor added. Improved %1 ratio on text.

Technical changes in CSC3.2 Alpha 6:
1. Very small improvement, most about DICT
2. This is the last version of CSC. Next version will be an archiver, and maybe called CSArc.

Technical changes in CSC3.2 Beta 2:
New version emphasize more on compression ratio rather than compression efficiency.
1. Rewrite LZ part for -m2/-m3 mode. Removed previous -m1 mode,now -m1 is previous -m2.
Now the LZ(-m2/3) use better parsing, but much slower than before.
Now literal coder is complete order-1 while previous use only 3-MSB.
2. New delta detector, but seems not better than before.
3. Now it's an archiver, many codes are from YZX(or from Sami Runsas).
4. Some other small modification.

Changes in CSC3.2 Final:
Fixed decompression error in previous version.

Attached Files Attached Files

File Type: zip csc32a6.zip (62.2 KB, 2548 views)
File Type: zip csc32b1.zip (78.7 KB, 1391 views)
File Type: 7z CSC32Final.7z (72.1 KB, 3097 views)
File Type: 7z CSC3.2FinalSource.7z (35.3 KB, 3507 views)
Last edited by Fu Siyuan; 1st March 2011 at 17:35. Reason: Update
Reply With Quote Reply With Quote
9th May 2010, 08:21 #2
Shelwien

View Profile

View Forum Posts

Private Message

Visit Homepage
Shelwien is offline
Administrator Shelwien's Avatar

Join Date

May 2008

Location

Kharkov, Ukraine

Posts

4,904

Thanks

432

Thanked 1,985 Times in 1,147 Posts
http://en.wikipedia.org/wiki/Linear_prediction
Also, did you look at the bsc source?
Its compression methods may be different, but detectors and filters
can be used as a reference.
Reply With Quote Reply With Quote
9th May 2010, 08:56 #3
osmanturan

View Profile

View Forum Posts

Private Message

Visit Homepage
osmanturan is offline
Programmer osmanturan's Avatar

Join Date

May 2008

Location

Mersin, Turkiye

Posts

651

Thanks

0

Thanked 0 Times in 0 Posts
Quote Originally Posted by Fu Siyuan View Post

3. Detection: CSC3.1 calculates each column's(12円3円4円8円16円) entropy to determine if delta is needed every 64KB. It's slow. Now such code instead:
while(progress<size-12)
{

sameDist[0]+=abs((signed)src[progress]-(signed)src[progress+1]);
sameDist[1]+=abs((signed)src[progress]-(signed)src[progress+2]);
sameDist[2]+=abs((signed)src[progress]-(signed)src[progress+3]);
sameDist[3]+=abs((signed)src[progress]-(signed)src[progress+4]);
sameDist[4]+=abs((signed)src[progress]-(signed)src[progress+8]);
sameDist[5]+=abs((signed)src[progress]-(signed)src[progress+12]);

progress++;
}

The MinBlockSize is 8KB. If largest sameDist[i] and smallest sameDist[i] differs much that often indicates this block is tables and smallest sameDist[i] is the Channel Num.

1 - This IS NOT entropy, you have used some kind of heuristics.
2 - Linear predictors are not very complex. Just think a series of coefficients and you just apply convolution at each step with those coefficients. You can even use heuristics to update coefficients if you want to make it adaptive. Or you can use least squares method to obtain coefficients.
3 - "Same match(last len and distance)" option seems unappropriate. Because, it does not occur too much IIRC. You can dump statistics of some files to see it really helps or not.

BIT Archiver homepage: www.osmanturan.com
Reply With Quote Reply With Quote
9th May 2010, 11:10 #4
Fu Siyuan

View Profile

View Forum Posts

Private Message

Visit Homepage
Fu Siyuan is offline
Member Fu Siyuan's Avatar

Join Date

Apr 2009

Location

Mountain View, CA, US

Posts

176

Thanks

11

Thanked 17 Times in 2 Posts
I just downloaded bsc source. It's a nice reference of detectors. :)

To Osman:
1. You misread it. CSC3.1 uses entropy, this does not. However, I finally understands your metioned much the word "heuristics".

2. I will have a try, I also read the RAR audio filter. "heuristics" often works for me.

3. Yes, you are right. Same match only benefits on a few files. I will fixed it in next alpha edition.
Reply With Quote Reply With Quote
9th May 2010, 11:19 #5
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
are you seen mmdet and delta in freearc sources?

That means in -m1 there is only 1 hash-6-bytes candiate on each string matching.

are you use "caching"? i think that saving 4 bytes of each match as check key along with pointer should reduce number of extra memory accesses to minimum. and since L2 cache line is 64 byte, you may access 8 matches without extra penalty:

Code:

hash = hash_func(p[0]...p[5]) key = *(uint32*)p for (i=0; i<8; i++) { if (ht[hash][i].key == key) { match_ptr = ht[hash][i].ptr match_len = memcmp_bytes(p, match_ptr) .... } }
Last edited by Bulat Ziganshin; 9th May 2010 at 11:27.
Reply With Quote Reply With Quote
9th May 2010, 11:28 #6
Fu Siyuan

View Profile

View Forum Posts

Private Message

Visit Homepage
Fu Siyuan is offline
Member Fu Siyuan's Avatar

Join Date

Apr 2009

Location

Mountain View, CA, US

Posts

176

Thanks

11

Thanked 17 Times in 2 Posts
Quote Originally Posted by Bulat Ziganshin View Post

are you seen mmdet and delta in freearc sources?

Truly speaking, it's not easy for me to understand all your ideas though I always keep trying to. At the same time I also try some my own ideas. BTW, your dict preprocessor attracts me more and I'm consider implement one.
Reply With Quote Reply With Quote
9th May 2010, 11:34 #7
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
Quote Originally Posted by Fu Siyuan View Post

Truly speaking, it's not easy for me to understand all your ideas though I always keep trying to. At the same time I also try some my own ideas. BTW, your dict preprocessor attracts me more and I'm consider implement one.

of course it's hard since my comments are in russian rather than chinese

dict implementation is amazingly optimized by speed but its algorithm isn't ideal. look at data produced by WRT: prerocessing by WRT makes further ppmd compression faster while preprocessing by DICT makes it slower

Last edited by Bulat Ziganshin; 9th May 2010 at 11:36.
Reply With Quote Reply With Quote
9th May 2010, 11:38 #8
Fu Siyuan

View Profile

View Forum Posts

Private Message

Visit Homepage
Fu Siyuan is offline
Member Fu Siyuan's Avatar

Join Date

Apr 2009

Location

Mountain View, CA, US

Posts

176

Thanks

11

Thanked 17 Times in 2 Posts
hash = hash_func(p[0]...p[5])
key = *(uint32*)p
for (i=0; i<8; i++) {
if (ht[hash][i].key == key) {
match_ptr = ht[hash][i].ptr
match_len = memcmp_bytes(p, match_ptr)
....
}
}

Based on my observation. Hash miss (find same hash but not a match) isn't much so I didn't use "if (ht[hash][i].key == key)"
In my program, i<1(-m1) 3 (-m2) 6(-m3). However, more candidates still slows much, not because memory access. Checking longer matches itself cost obvious long time.
Another test: Hash8-byte is very helps on enwik8/9. But get much worse ratio on binary.
Reply With Quote Reply With Quote
9th May 2010, 12:19 #9
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
Quote Originally Posted by Fu Siyuan View Post

Hash miss (find same hash but not a match) isn't much

every memory access costs ~100 ticks

However, more candidates still slows much, not because memory access. Checking longer matches itself cost obvious long time.

much more than 100 ticks? ;) taking into account that most of checked matches will be less that 4 bytes

you may gather stats, for example on ewnwik8 and my dll100.dll with 1..8 matches checked: % of hash misses, average match_len (in means of my code)

Another test: Hash8-byte is very helps on enwik8/9. But get much worse ratio on binary.

many years ago i've found the same with hash4 matchfinder instead of hash3 widely used those times :)

now i think that one should use separate hash table that fits in L2 chache. this means about 2^20 entries, so it may be used only for matches up to 5 or 6 bytes. btw, it's better organized as hc4 since it doesn't suffer from main hc4 problem - lot of memory access

although if one need maximum-maximum compression w/o bt4, she may use separate hash with size up to 100 mbytes, so it may find matches up to 7-8 bytes

Last edited by Bulat Ziganshin; 9th May 2010 at 12:55.
Reply With Quote Reply With Quote
9th May 2010, 12:54 #10
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
"linear predictor + order0" is very simple thing. you subtracts successive values (over N bytes) - this is called linear predictor, and then computes order0 compression of resulted dara. what N gives best compression and if it gives better compression than order0 of original data - it should be used. it's how mmdet works - you may run its executable ( http://freearc.org/Research.aspx ) on wav/bmp files

so, delta preprocessing by itself is called linear predictor :) simplest way to check whether it helps is to compare order0 compression before and after applying it
Reply With Quote Reply With Quote
9th May 2010, 13:37 #11
Fu Siyuan

View Profile

View Forum Posts

Private Message

Visit Homepage
Fu Siyuan is offline
Member Fu Siyuan's Avatar

Join Date

Apr 2009

Location

Mountain View, CA, US

Posts

176

Thanks

11

Thanked 17 Times in 2 Posts
Quote Originally Posted by Bulat Ziganshin View Post

"linear predictor + order0" is very simple thing. you subtracts successive values (over N bytes) - this is called linear predictor, and then computes order0 compression of resulted dara. what N gives best compression and if it gives better compression than order0 of original data - it should be used. it's how mmdet works - you may run its executable ( http://freearc.org/Research.aspx ) on wav/bmp files

so, delta preprocessing by itself is called linear predictor :) simplest way to check whether it helps is to compare order0 compression before and after applying it

Yeah, I know delta is a kind of linear predictor. Matt explained this in his article. But only subtracts successive values is not enough to get a good performance on wav/bmps. Current CSC3.2 use delta and order1, it's slow and performs not as good as RAR, expecially on bitmap. CSC3.1 does what your said. The new method of detection is a trying. It should be simpler and faster.
Reply With Quote Reply With Quote
9th May 2010, 13:47 #12
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
i talked only about detector, since you said that you don't understand thats's "linear predictor + order0"

overall, MM compression is its own area and lz+filter is a very poor men's approach here :)
Reply With Quote Reply With Quote
10th May 2010, 16:06 #13
Fu Siyuan

View Profile

View Forum Posts

Private Message

Visit Homepage
Fu Siyuan is offline
Member Fu Siyuan's Avatar

Join Date

Apr 2009

Location

Mountain View, CA, US

Posts

176

Thanks

11

Thanked 17 Times in 2 Posts
Smile

Hi Bulat:

Thanks for your advice about caching. To avoid use too much memory, I use only 3bits to store the key (3bit key+29 bit dictionary size). On many test cases, the program speeds up by nearly 10%.
Reply With Quote Reply With Quote
10th May 2010, 19:32 #14
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
csc32 tests. http://mattmahoney.net/dc/text.html#2502
Reply With Quote Reply With Quote
11th May 2010, 19:17 #15
Nania Francesco

View Profile

View Forum Posts

Private Message

Visit Homepage
Nania Francesco is offline
Tester
Nania Francesco's Avatar

Join Date

May 2008

Location

Italy

Posts

1,726

Thanks

266

Thanked 218 Times in 129 Posts
Hi!

Very very nice results for this version!
Super and fast LZ77! hi nice job!
Reply With Quote Reply With Quote
7th September 2010, 16:51 #16
m^2

View Profile

View Forum Posts

Private Message

Visit Homepage
m^2 is offline
Member m^2's Avatar

Join Date

Sep 2008

Location

Ślůnsk, PL

Posts

1,610

Thanks

30

Thanked 65 Times in 47 Posts
Some tests:
TCUP:
No code has to be inserted here.Exe:
No code has to be inserted here.Sorry, no compression timings, but it felt fast.

https://extrememoderate.wordpress.com
Reply With Quote Reply With Quote
22nd December 2010, 14:21 #17
Fu Siyuan

View Profile

View Forum Posts

Private Message

Visit Homepage
Fu Siyuan is offline
Member Fu Siyuan's Avatar

Join Date

Apr 2009

Location

Mountain View, CA, US

Posts

176

Thanks

11

Thanked 17 Times in 2 Posts
New update of my program, with source code.

Use a better flexible parsing.
Reply With Quote Reply With Quote
22nd December 2010, 14:55 #18
Fu Siyuan

View Profile

View Forum Posts

Private Message

Visit Homepage
Fu Siyuan is offline
Member Fu Siyuan's Avatar

Join Date

Apr 2009

Location

Mountain View, CA, US

Posts

176

Thanks

11

Thanked 17 Times in 2 Posts
Now it's ok. I'm not sure if the codes work without Visual C++ 2008

Last edited by Fu Siyuan; 22nd December 2010 at 16:12.
Reply With Quote Reply With Quote
23rd March 2011, 00:45 #19
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Updated results for csc32 final. http://mattmahoney.net/dc/text.html#2299
Reply With Quote Reply With Quote
25th May 2013, 22:52 #20
Surfer

View Profile

View Forum Posts

Private Message
Surfer is offline
Member Surfer's Avatar

Join Date

Mar 2009

Location

oren

Posts

203

Thanks

18

Thanked 7 Times in 1 Post
Any progress on it? :)
Reply With Quote Reply With Quote

« Previous Thread | Next Thread »

Similar Threads

CSC3 FINAL

By Fu Siyuan in forum Data Compression

Replies: 12
Last Post: 25th August 2009, 07:59

Tags for this Thread

ari, csc, lz

View Tag Cloud

Posting Permissions

You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
[VIDEO] code is On
HTML code is Off

Forum Rules

All times are GMT +3. The time now is 11:57.

Thread: CSC3.2 is on developing

CSC3.2

Smile

Hi!

Similar Threads

CSC3 FINAL

Tags for this Thread

Posting Permissions