Results 1 to 30 of 143

Thread: MCM + LZP

- Show Printable Version
- Email this Page…
- Advanced Search
- Linear Mode
- Switch to Hybrid Mode
- Switch to Threaded Mode

5th February 2015, 08:01 #1
Mat Chartier

View Profile

View Forum Posts

Private Message
Mat Chartier is offline
Member

Join Date

Jun 2013

Location

Canada

Posts

55

Thanks

35

Thanked 66 Times in 27 Posts
MCM + LZP
Hey all, I took a 1 + 1/2 year break from doing compression stuff but I recently started messing around with CM + LZP. Attached is MCM 64 bit.

The current version has 2 modes, -fast and -high.
-fast is 6 context CM + LZP
-high is 8 context CM + LZP (default -high)
The LZP is very basic, it just recycles the same bit processing as the normal CM. It uses 256 + expected char for the order 0 xor context instead of the normal 8 bit order o0.
If the match model has an expected char and the bit is a non match, it uses this info through sse to help reduce non match char costs. When there is no match model match, it just does a normal byte without processing an LZP bit.
Speed should be similar to mcm03, or a bit slower (for now). I wonder how ZCM has so much speed?

Changes:
Random tweaking, added SSE for helping the LZP.
Added -10 and -11 for more memory which uses around 2.8GB / 5.5GB (only on 64 bit versions).
Huffman disabled.
E8E9 filter (always on lol, works OK for most files unless lots of false positives).
64 bit support
linux support (./make.bat on ubuntu should compile).

Sample results with -9:
sfc: 10,382,334
enwik9: 160,663,427
enwik8: 19,505,098
silesia: 41,025,259

Source code is updated on my github:
https://github.com/mathieuchartier/mcm

I think there are probably more ways to improve the LZP by having less models / more specialized models for the LZP predictor bit.

Attached Files Attached Files

File Type: 7z mcm.7z (521.2 KB, 1106 views)
Reply With Quote Reply With Quote
Thanks (7):

birdie (30th September 2024),Bulat Ziganshin (5th February 2015),comp1 (6th February 2015),Matt Mahoney (5th February 2015),Nania Francesco (5th February 2015),Stephan Busch (5th February 2015),surfersat (6th February 2015)
5th February 2015, 21:48 #2
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Updated silesia and LTCB benchmarks.
http://mattmahoney.net/dc/silesia.html
http://mattmahoney.net/dc/text.html (went from 20'th to 12'th place :) )
Reply With Quote Reply With Quote
Thanks:

Mat Chartier (5th February 2015)
6th February 2015, 15:02 #3
comp1

View Profile

View Forum Posts

Private Message
comp1 is offline
Member

Join Date

May 2012

Location

United States

Posts

351

Thanks

212

Thanked 58 Times in 42 Posts
Quote Originally Posted by Mat Chartier View Post

Hey all, I took a 1 + 1/2 year break from doing compression stuff but I recently started messing around with CM + LZP. Attached is MCM 64 bit.

The current version has 2 modes, -fast and -high.
-fast is 6 context CM + LZP
-high is 8 context CM + LZP (default -high)
The LZP is very basic, it just recycles the same bit processing as the normal CM. It uses 256 + expected char for the order 0 xor context instead of the normal 8 bit order o0.
If the match model has an expected char and the bit is a non match, it uses this info through sse to help reduce non match char costs. When there is no match model match, it just does a normal byte without processing an LZP bit.
Speed should be similar to mcm03, or a bit slower (for now). I wonder how ZCM has so much speed?

Changes:
Random tweaking, added SSE for helping the LZP.
Added -10 and -11 for more memory which uses around 2.8GB / 5.5GB (only on 64 bit versions).
Huffman disabled.
E8E9 filter (always on lol, works OK for most files unless lots of false positives).
64 bit support
linux support (./make.bat on ubuntu should compile).

Sample results with -9:
sfc: 10,382,334
enwik9: 160,663,427
enwik8: 19,505,098
silesia: 41,025,259

Source code is updated on my github:
https://github.com/mathieuchartier/mcm

I think there are probably more ways to improve the LZP by having less models / more specialized models for the LZP predictor bit.

Great to see this and to see you back. Don't take any more breaks from compression! :)
Reply With Quote Reply With Quote
9th February 2015, 01:23 #4
Mat Chartier

View Profile

View Forum Posts

Private Message
Mat Chartier is offline
Member

Join Date

Jun 2013

Location

Canada

Posts

55

Thanks

35

Thanked 66 Times in 27 Posts
Almost ready to release 0.81 which has a nice improvement for exe / binary and slight improvement on text. One thing I keep wondering about is how to improve performance without hurting compression much?

So far, my ideas for increasing speed are:
Faster LZP match bit prediction (hard to do without hurting compression).
Re-enable huffman preprocessing. However, the huffman benefits are mitigated by LZP and static huffman doesn't work really well for large files.
Multi-threading: Usually hurts compression, lot of work to implement.
Re-enable prefetching, probably only helps a little bit.
Use dictionary transform instead of LZP for text (hard to implement).

Anyone else have ideas?
Reply With Quote Reply With Quote
9th February 2015, 06:15 #5
comp1

View Profile

View Forum Posts

Private Message
comp1 is offline
Member

Join Date

May 2012

Location

United States

Posts

351

Thanks

212

Thanked 58 Times in 42 Posts
Quote Originally Posted by Mat Chartier View Post

Almost ready to release 0.81 which has a nice improvement for exe / binary and slight improvement on text. One thing I keep wondering about is how to improve performance without hurting compression much?

So far, my ideas for increasing speed are:
Faster LZP match bit prediction (hard to do without hurting compression).
Re-enable huffman preprocessing. However, the huffman benefits are mitigated by LZP and static huffman doesn't work really well for large files.
Multi-threading: Usually hurts compression, lot of work to implement.
Re-enable prefetching, probably only helps a little bit.
Use dictionary transform instead of LZP for text (hard to implement).

Anyone else have ideas?

Mat,

I don't have any specific ideas for you. But I would say that the speed of 0.80 is very good already. Already close in speed to CCMX and in many cases MCM produces a better ratio at -9 level than CCMX at 7 level.

Perhaps your goals for MCM are more focused on improving speed but improving compression while keeping current speed seems like a good idea to me.

Perhaps another option would be to incorporate other filters? CCMX excels in certain cases over MCM because of it's specific data filters. You mentioned EXE and text filters. What about a BMP or WAV filter? But, then again, perhaps what you are doing trying to improve upon the EXE and text filters beyond the 0.81 improvements is all that is necessary.

Sorry I cannot be of more help...I am not a scientist or a programmer.

Good luck and keep working on it! :)
Reply With Quote Reply With Quote
9th February 2015, 14:15 #6
Stephan Busch

View Profile

View Forum Posts

Private Message

Visit Homepage
Stephan Busch is offline
Tester
Stephan Busch's Avatar

Join Date

May 2008

Location

Bremen, Germany

Posts

879

Thanks

476

Thanked 176 Times in 86 Posts
If you implement the use of wildcards and recursion, you can let MCM switch off models that would give no gain on certain files
while switching those models on that would help - p.ex. an audio model on every WAV/AIFF, a PGM model for PGM, a PPM model for PPM images.
In both cases the text model can be switched off. While on texts, we would need the text model but not models for audio, PGM, PPM..
In my test cases, Huffman did not bring improvements.
If I remember right, programs like UHARC and CCMX tried the delta filter every few kb and checked if it gives gain.
If yes it will be enabled, if not, it is not enabled.
I think that dictionary transform would be much better for text than LZP. I like this idea.
I would also vote for a light Huffman recompressor.
Reply With Quote Reply With Quote
10th February 2015, 09:15 #7
Mat Chartier

View Profile

View Forum Posts

Private Message
Mat Chartier is offline
Member

Join Date

Jun 2013

Location

Canada

Posts

55

Thanks

35

Thanked 66 Times in 27 Posts
mcm v0.81:
Significant improvements to binary models, slight improvements to text model. Added prefetch to match model and a few other things for speed tuning.
Added -turbo (3 contexts), -fast (4 contexts), and -max (10 contexts)

Reference mcm v0.8:
SFC -9:
-high 10,382,334 in 38.198s
Silesia -9:
-high 41,025,259 in 143.9s

New mcm v0.81:
SFC -9:
-turbo (3 ctx): 10,733,617 in 21.629s
-fast (4 ctx): 10,555,083 in 26.109s
-mid (6 ctx): 10,313,325 in 31.408s
-high (8 ctx): 10,251,772 in 35.866s
-max (10 ctx): 10,208,213 in 38.358s
Silesia -9:
-fast: 42,200,550 in 91.2s
-mid: 40,632,970 in 116.956s
-high: 40,330,795 in 136.879s
-max: 40,151,163 in 148.767s

Turns out testing whether a bit is set is fast enough to do a few times per byte.
I added an easy way to programmatically enable / disable models. Example of current models for binary profile:
if (inputs > idx++) enableModel(kModelOrder1);
if (inputs > idx++) enableModel(kModelOrder2);
if (inputs > idx++) enableModel(kModelSparse34);
if (inputs > idx++) enableModel(kModelOrder4);
if (inputs > idx++) enableModel(kModelSparse23);
if (inputs > idx++) enableModel(kModelMask);
if (inputs > idx++) enableModel(kModelSparse4);
if (inputs > idx++) enableModel(kModelOrder3);
if (inputs > idx++) enableModel(kModelSparse2);
if (inputs > idx++) enableModel(kModelSparse3);

I made a branch for v0.81, so that when I do incremental development, the source is still visible.
https://github.com/mathieuchartier/mcm/tree/v0.81

Attached Files Attached Files

File Type: 7z mcm.7z (451.6 KB, 841 views)
Reply With Quote Reply With Quote
Thanks:

Stephan Busch (10th February 2015)
10th February 2015, 16:10 #8
comp1

View Profile

View Forum Posts

Private Message
comp1 is offline
Member

Join Date

May 2012

Location

United States

Posts

351

Thanks

212

Thanked 58 Times in 42 Posts
Mat,

Great improvement in compression!

HFCB's vm.dll

Code:

SIZE TIME PROGRAM ========================================= 958143613 1990.67500 mcm 0.80 -9 -high 881842673 2126.16200 mcm 0.81 -9 -max

Again, excellent improvement!
Reply With Quote Reply With Quote
Thanks:

Mat Chartier (11th February 2015)
16th February 2015, 09:29 #9
Mat Chartier

View Profile

View Forum Posts

Private Message
Mat Chartier is offline
Member

Join Date

Jun 2013

Location

Canada

Posts

55

Thanks

35

Thanked 66 Times in 27 Posts
MCM v0.82, minor update:
Updates:
Tuned x86 filter to have less bad transforms on non x86 data.
Tuning some constants through hill climb / brute force, improved match model handling.
More prefetching + other optimizations (~5-10% faster vs 0.81).

TODO:
Rewrite text / binary detector + add more filters (lot of work).

Enwik8 -9 -high:
v0.80: 19505098 in 90.556s
v0.81: 19381820 in 85.029s
v0.82: 19318179 in 77.556s

SFC -9:
-mid: 10,307,681
-high: 10,246,239
-max: 10,207,930

Silesia -9:
-mid: 40,541,614
-high: 40,251,527
-max: 40,091,648

Source at: https://github.com/mathieuchartier/mcm/tree/v0.82

Attached Files Attached Files

File Type: 7z mcm82.7z (454.4 KB, 454 views)
Reply With Quote Reply With Quote
Thanks (3):

Cyan (16th February 2015),Nania Francesco (16th February 2015),Stephan Busch (16th February 2015)
16th February 2015, 11:08 #10
comp1

View Profile

View Forum Posts

Private Message
comp1 is offline
Member

Join Date

May 2012

Location

United States

Posts

351

Thanks

212

Thanked 58 Times in 42 Posts
Nice improvement Mat :)

Code:

SIZE TIME PROGRAM ========================================= 881842673 2126.16200 mcm 0.81 -9 -max 879245585 1846.88800 mcm 0.82 -9 -max

Also, I did some other quick tests and MCM 0.82 beats ZCM and CCMX every time for compression ratio. You are on your way to beating NanoZip. Once data type detection/filters are implemented, I believe you will beat NanoZip for ratio (since you have already beaten it for speed).

Don't give up on this project. :)
Last edited by comp1; 18th February 2015 at 07:54.
Reply With Quote Reply With Quote
5th April 2015, 09:26 #11
Mat Chartier

View Profile

View Forum Posts

Private Message
Mat Chartier is offline
Member

Join Date

Jun 2013

Location

Canada

Posts

55

Thanks

35

Thanked 66 Times in 27 Posts
I am proud to announce mcm v0.83 which has nice improvements in text.

New features:
2 pass approach: analyze then compress streams one at a time
Dynamic dictionary preprocessor using 0 - 600000+ words depending on file.
Special .wav compressor (not great ratio, but fast at >10MB/sec).
-store option for using just filters and dictionary preprocessor (undocumented)

New LZP idea for text, only use LZP if the match length is greater than ~13, otherwise use match model. This works better than never using LZP according to my tests.

Sample results:
calgary.tar -x9: 667,097 in 3s
silesia -m9: 40,340,745 in 102s
silesia -x9: 39,772,294 in 135s
enwik8 -m9: 18,623,547 in 51s
enwik8 -m9: 18,376,397 in 69s

Future ideas:
Is there a way to do a per sample CM to for audio samples? Logistic mixing makes no sense, but something like paq6 linear mixing might work. I want a way to have negative weights though.

Caution:
This version has a lot of changes and could be buggy. The dictionary building memory usage could be HUGE if the file is very large. Also 32 bit version isn't very tested and might be broken on >4GB files.

Source at:
https://github.com/mathieuchartier/mcm/tree/v0.83

Attached Files Attached Files

File Type: 7z mcm83.7z (510.2 KB, 2044 views)
Reply With Quote Reply With Quote
Thanks (5):

Bulat Ziganshin (10th April 2015),comp1 (5th April 2015),Jan Ondrus (14th May 2015),Nania Francesco (5th April 2015),Stephan Busch (5th April 2015)
5th April 2015, 17:20 #12
comp1

View Profile

View Forum Posts

Private Message
comp1 is offline
Member

Join Date

May 2012

Location

United States

Posts

351

Thanks

212

Thanked 58 Times in 42 Posts
Good improvement!

Code:

SIZE TIME PROGRAM ========================================= 879245585 1846.88800 mcm 0.82 -9 -max 874226934 1336.988 mcm 0.83 -x9

Keep working on it! :)
Reply With Quote Reply With Quote
Thanks:

Mat Chartier (5th April 2015)
10th April 2015, 18:25 #13
Nania Francesco

View Profile

View Forum Posts

Private Message

Visit Homepage
Nania Francesco is offline
Tester
Nania Francesco's Avatar

Join Date

May 2008

Location

Italy

Posts

1,726

Thanks

266

Thanked 218 Times in 129 Posts
I am glad that you have heard suggestions by inserting the LZP in the program. Definitely did not expect these great strides. Only one thing I wanted you to consider: I've seen that uses float numbers yet and I don't think this makes it compatible with all CPUs mcm (you may test the archives with various CPUs). However the strength of the program is definitely in text processor which is a strong point (I hope soon to see the results of Matt on LTCB)

Last edited by Nania Francesco; 10th April 2015 at 18:27.
Reply With Quote Reply With Quote
Thanks (2):

Bulat Ziganshin (10th April 2015),Mat Chartier (10th April 2015)
10th April 2015, 20:46 #14
Mat Chartier

View Profile

View Forum Posts

Private Message
Mat Chartier is offline
Member

Join Date

Jun 2013

Location

Canada

Posts

55

Thanks

35

Thanked 66 Times in 27 Posts
Hi Nania, I think the latest versions "should" not use floats any more, where did you see the float numbers in the coding process? I moved to using stems for squash / stretch a while back. One thing you may have seen is that dictionary preprocessing is non deterministic, i.e. 32 bit mcm has a slightly different dictionary than 64 bit. However this is benign since I encode the dict at the start of the file.

I still feel like the MCM LZP is not as strong as ZCM's LZP, for example, FP.log is slower and less compressed with MCM (use -filters=none) than ZCM 0.92. When you encode LZP, do you use a single bit for LZP / no LZP, or encode a whole number length for the match length.
Reply With Quote Reply With Quote
10th April 2015, 21:16 #15
Nania Francesco

View Profile

View Forum Posts

Private Message

Visit Homepage
Nania Francesco is offline
Tester
Nania Francesco's Avatar

Join Date

May 2008

Location

Italy

Posts

1,726

Thanks

266

Thanked 218 Times in 129 Posts
I'm glad you removed the float calculations (I was wrong evidently). I'm at work to version 0.93 of ZCM and it will be difficult to compete with current MCM.
Reply With Quote Reply With Quote
22nd April 2015, 21:49 #16
Skymmer

View Profile

View Forum Posts

Private Message
Skymmer is offline
Member Skymmer's Avatar

Join Date

Mar 2009

Location

Russia

Posts

707

Thanks

44

Thanked 185 Times in 96 Posts
O__O

Strange that its gone unnoticed but MCM 0.83 -x11 enters the Top-10 of LTCB.

Code:

mcm 0.83 x64 -x11 enwik8 18 233 295 44.500s 5839 MB 40.203s 5707 MB mcm 0.83 x64 -x11 enwik9 144 854 575 394.421s 5961 MB 281.250s 5723 MB Decompresser is 79 574 (sources from GitHub, size optimized by removing comments, KZIP /b512)

So, with overall size of 144 934 149 bytes, MCM 0.83 -x11 takes 7th place beating such monsters as NanoZip, xwrt, fp8 and WinRK.
Considering the speed MCM reaches this result with, its simply astonishing.

Decompression is correct, tested on i7-2700K, WinXP x64 SP2. Size optimized sources attached.

Attached Files Attached Files

File Type: zip mcm083_src_size_opt.zip (77.7 KB, 843 views)
Reply With Quote Reply With Quote
Thanks (5):

comp1 (22nd April 2015),Cyan (23rd April 2015),kaitz (22nd April 2015),Mat Chartier (23rd April 2015),willvarfar (23rd April 2015)
23rd April 2015, 00:29 #17
Mat Chartier

View Profile

View Forum Posts

Private Message
Mat Chartier is offline
Member

Join Date

Jun 2013

Location

Canada

Posts

55

Thanks

35

Thanked 66 Times in 27 Posts
Thanks for measuring this, I think there is still a fair bit of area for improvement by better assigning codes for the dictionary. Keep in mind that beating those compressors but using 3-4x the RAM isn't exactly apples to apples :)

Right now I have basic multi file support done for windows, and I am also working on simple block based dedupe. These will be part of 0.84, but I don't think there will be any significant changes to the dictionary or CM core.

EDIT:
Thx Matt, I'll try to make sure that is fixed in the next version.

Last edited by Mat Chartier; 23rd April 2015 at 00:35.
Reply With Quote Reply With Quote
23rd April 2015, 00:31 #18
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
It's about time I updated LTCB. @Skymmer, thanks for testing. http://mattmahoney.net/dc/text.html#1449

mcm32.exe -x9 failed on silesia/dickens. Decompression stalled, and when I killed it the output was the right size but all zero bytes. The rest of silesia was OK.
Reply With Quote Reply With Quote
30th May 2015, 21:28 #19
Intensity

View Profile

View Forum Posts
Intensity is offline
Member

Join Date

Jul 2013

Location

Stanford, California

Posts

26

Thanks

7

Thanked 3 Times in 3 Posts
Hi Mat,

Thanks for mcm. Can you help with compilation on Linux and FreeBSD (and, lowest priority, OS X)? It seems that make.sh refers to a missing File.cpp. And then clang++ gave some errors on OS X. Also if anyone else has gotten it to compile and might have a static executable that would be great.
Reply With Quote Reply With Quote
31st May 2015, 02:37 #20
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
I don't see File.cpp in make.sh. I just tried again in 64 bit Ubuntu/g++ 4.8.2 and make.sh works both on the github sources and Skymmer's optimized sources. make.sh looks like this:

g++ -DNDEBUG -O3 -fomit-frame-pointer -msse2 -std=c++0x -D_FILE_OFFSET_BITS=64 -o mcm CM.cpp Archive.cpp Huffman.cpp MCM.cpp Memory.cpp Util.cpp Compressor.cpp LZ.cpp -lpthread
Reply With Quote Reply With Quote
31st May 2015, 04:07 #21
nemequ

View Profile

View Forum Posts

Private Message
nemequ is offline
Member

Join Date

Jul 2013

Location

United States

Posts

194

Thanks

44

Thanked 140 Times in 69 Posts
Quote Originally Posted by Matt Mahoney View Post

I don't see File.cpp in make.sh. I just tried again in 64 bit Ubuntu/g++ 4.8.2 and make.sh works both on the github sources and Skymmer's optimized sources. make.sh looks like this:

g++ -DNDEBUG -O3 -fomit-frame-pointer -msse2 -std=c++0x -D_FILE_OFFSET_BITS=64 -o mcm CM.cpp Archive.cpp Huffman.cpp MCM.cpp Memory.cpp Util.cpp Compressor.cpp LZ.cpp -lpthread

Seems like you're using an old checkout. Current git looks like this:

Code:

g++ -DNDEBUG -O3 -fomit-frame-pointer -msse2 -std=c++0x -D_FILE_OFFSET_BITS=64 -o mcm CM.cpp Archive.cpp Huffman.cpp MCM.cpp Memory.cpp Util.cpp Compressor.cpp LZ.cpp File.cpp -lpthread

The issue has already been reported (just over two weeks ago).
Reply With Quote Reply With Quote
31st May 2015, 19:35 #22
avitar

View Profile

View Forum Posts

Private Message
avitar is offline
Member

Join Date

Sep 2011

Location

uk

Posts

239

Thanks

202

Thanked 17 Times in 12 Posts
Did I miss something? What is mcm32 in your .7z distribution - assumed a 32 bit version. However doesn't work on xp- 'not a win32 program'. John
Reply With Quote Reply With Quote
31st May 2015, 19:45 #23
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
i'm adding these flags when compiling with msvc:
-link -MACHINE:x86 -SUBSYSTEM:CONSOLE,5.01 -LARGEADDRESSAWARE
-link -MACHINE:x64 -SUBSYSTEM:CONSOLE,5.02

on the built executable you can do

editbin.exe a.exe /SUBSYSTEM:CONSOLE,5.01 /LARGEADDRESSAWARE
Reply With Quote Reply With Quote
1st June 2015, 20:01 #24
avitar

View Profile

View Forum Posts

Private Message
avitar is offline
Member

Join Date

Sep 2011

Location

uk

Posts

239

Thanks

202

Thanked 17 Times in 12 Posts
thanks - so can the distr plese be updated Matt, mcm32 runs on xp. TIA John
Reply With Quote Reply With Quote
2nd June 2015, 04:57 #25
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Yes, I was using an old checkout.
Reply With Quote Reply With Quote
9th July 2015, 01:06 #26
rkt2u

View Profile

View Forum Posts

Private Message
rkt2u is offline
Member

Join Date

Jul 2015

Location

Algeria

Posts

2

Thanks

0

Thanked 0 Times in 0 Posts
MCM v0.83 Missing File.cpp

Quote Originally Posted by Matt Mahoney View Post

I don't see File.cpp in make.sh. I just tried again in 64 bit Ubuntu/g++ 4.8.2 and make.sh works both on the github sources and Skymmer's optimized sources. make.sh looks like this:

g++ -DNDEBUG -O3 -fomit-frame-pointer -msse2 -std=c++0x -D_FILE_OFFSET_BITS=64 -o mcm CM.cpp Archive.cpp Huffman.cpp MCM.cpp Memory.cpp Util.cpp Compressor.cpp LZ.cpp -lpthread

When i compile under Linux Mint Cinnamon 17.1 i got this:
-------------------------------------------------------------------------------------------
blackhole@galaxy ~/mcm $ ./make.sh
g++: error: File.cpp: No such file or directory

This the content of "make.sh":
----------------------------------------------
g++ -DNDEBUG -O3 -fomit-frame-pointer -msse2 -std=c++0x -D_FILE_OFFSET_BITS=64 -o mcm CM.cpp Archive.cpp Huffman.cpp MCM.cpp Memory.cpp Util.cpp Compressor.cpp LZ.cpp File.cpp -lpthread

When i remove "File.cpp" from "make.sh" i got this:
--------------------------------------------------------------------------------
/tmp/ccDDyso4.o: In function `Archive::writeBlocks()':
Archive.cpp.text+0x289c): undefined reference to `FileList::write(Stream*)'
/tmp/ccDDyso4.o: In function `Archive::readBlocks()':
Archive.cpp.text+0x2caa): undefined reference to `FileList::read(Stream*)'
/tmp/ccDDyso4.o: In function `Archive::list()':
Archive.cpp.text+0x2d6f): undefined reference to `FileInfo::attrToStr(unsigned short)'
/tmp/ccDDyso4.o: In function `Archive::compress(std::vector<FileInfo, std::allocator<FileInfo> > const&)':
Archive.cpp.text+0x3aa9): undefined reference to `FileList::addDirectoryRec(std::string const&, std::string const*)'
/tmp/ccE9QpCX.o: In function `Options:arse(int, char**)':
MCM.cpp.text._ZN7Options5parseEiPPc[_ZN7Options5parseEiPPc]+0xf: undefined reference to `FileInfo::FileInfo(std::string const&, std::string const*)'
MCM.cpp.text._ZN7Options5parseEiPPc[_ZN7Options5parseEiPPc]+0x3e6): undefined reference to `FileInfo::FileInfo(std::string const&, std::string const*)'
MCM.cpp.text._ZN7Options5parseEiPPc[_ZN7Options5parseEiPPc]+0x476): undefined reference to `FileInfo::FileInfo(std::string const&, std::string const*)'
MCM.cpp.text._ZN7Options5parseEiPPc[_ZN7Options5parseEiPPc]+0x600): undefined reference to `FileInfo::FileInfo(std::string const&, std::string const*)'
MCM.cpp.text._ZN7Options5parseEiPPc[_ZN7Options5parseEiPPc]+0x689): undefined reference to `FileInfo::FileInfo(std::string const&, std::string const*)'
/tmp/ccE9QpCX.o:MCM.cpp.text._ZN7Options5parseEiPPc[_ZN7Options5parseEiPPc]+0x9a9): more undefined references to `FileInfo::FileInfo(std::string const&, std::string const*)' follow
/tmp/ccE9QpCX.o: In function `main':
MCM.cpp.text.startup+0x9a): undefined reference to `FileInfo::FileInfo()'
collect2: error: ld returned 1 exit status

Note:
-------
take in consideration that it's not an option to use the old "File.hpp" since that one contain a different version of the "class FileInfo, FileList"

i hope to take this issue in consideration and thank anyway for that beautiful code .
Reply With Quote Reply With Quote
13th July 2015, 04:20 #27
Skymmer

View Profile

View Forum Posts

Private Message
Skymmer is offline
Member Skymmer's Avatar

Join Date

Mar 2009

Location

Russia

Posts

707

Thanks

44

Thanked 185 Times in 96 Posts
MCM v0.83 skbuild1
- now you can set memory level up to 13 allowing slightly better compression (be carefull, too memory hungry!)
- console output have been redesigned for better viewing but also it reveals some undocumented options for more precise control over filtering and lzp
- 32bit compile is static GCC instead of MSVC so no more requirements for MSVC runtimes

Attached Files Attached Files

File Type: 7z mcm_083_skbuild1.7z (716.0 KB, 796 views)
Reply With Quote Reply With Quote
Thanks (2):

avitar (13th July 2015),Mike (13th July 2015)
15th July 2015, 07:06 #28
SolidComp

View Profile

View Forum Posts

Private Message
SolidComp is offline
Member SolidComp's Avatar

Join Date

Jun 2015

Location

USA

Posts

485

Thanks

152

Thanked 70 Times in 50 Posts
This is remarkable. MCM took a 279 KB CreateJS.js file down to 35 KB. The best I could do with 7Z was 46 KB.

NOTE: It wouldn't compress JS files directly. I guess it's not supported. I had to convert them to .txt first.

It would help if you could post the command line syntax and arguments. I had some trouble there. And the supported source file formats. I understand if you don't want to bother with direct JS support, since on the web we're forced to deliver JS as gzip anyway, but it would be helpful to know all the supported file types.

I really wish we could get some movement on web compression formats beyond gzip. I'm not a gzip hater – it's unbelievably efficient and light, amazing that something written in the 80s and 90s could still be so hard to beat on CPU and memory use at a given comp ratio and decomp speed. But certainly we must be able to develop a format at this point that can match gzip CPU, memory efficiency, and decomp speed, while picking up a big win on compression ratio. MCM seems to have potential there.
Reply With Quote Reply With Quote
15th July 2015, 07:13 #29
SolidComp

View Profile

View Forum Posts

Private Message
SolidComp is offline
Member SolidComp's Avatar

Join Date

Jun 2015

Location

USA

Posts

485

Thanks

152

Thanked 70 Times in 50 Posts
Matt Mahoney, thanks for your benchmarks page. Question: Is the mem column really supposed to be in megabytes, as indicated in your legend? If so, it means lots of programs are using many GBs of RAM, like cmix v6 using 30+ GBs (first row). Most PCs don't have the amount of memory this table suggests these programs are using.

Relatedly, could you post the specs and config for the machine you're using for the benchmarks? Sorry if it's already posted and I missed it...
Reply With Quote Reply With Quote
16th July 2015, 22:51 #30
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Yes, mem is in MB, and yes, cmix requires 30 GB of RAM. The note column shows the spec for the test machine. Most of my recent tests are on a Linux machine (note 48 ), but some of my older tests are on other machines. I realize that speed depends on CPU and compression ratio depends on memory. I don't want to put any limits on hardware. If your compressor is top ranked but only runs on a quantum computer :D then we learned something.
Reply With Quote Reply With Quote

« Previous Thread | Next Thread »

Similar Threads

MCM open source

By Mat Chartier in forum Data Compression

Replies: 12
Last Post: 29th August 2013, 21:22
Yet another LZP/PPM compressor

By RichSelian in forum Data Compression

Replies: 3
Last Post: 1st August 2013, 18:21
TinyLZP - A very simple LZP compressor

By david_werecat in forum Data Compression

Replies: 8
Last Post: 15th October 2012, 04:05
lzp question

By sourena in forum Data Compression

Replies: 4
Last Post: 5th February 2012, 18:24
flzp, new LZP compressor/preprocessor

By Matt Mahoney in forum Data Compression

Replies: 13
Last Post: 23rd June 2008, 18:24

Posting Permissions

You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
[VIDEO] code is On
HTML code is Off

Forum Rules

All times are GMT +3. The time now is 05:00.

Thread: MCM + LZP

MCM + LZP

Thanks (7):

Thanks:

Thanks:

Thanks:

Thanks (3):

Thanks (5):

Thanks:

Thanks (2):

Thanks (5):

MCM v0.83 Missing File.cpp

Thanks (2):

Similar Threads

MCM open source

Yet another LZP/PPM compressor

TinyLZP - A very simple LZP compressor

lzp question

flzp, new LZP compressor/preprocessor

Posting Permissions